

  • A framework for creating virtual assistants for specific purposes.
  • Language model integration: uses LLMs (e.g. OpenAI GPT-4, LaMDa, LLaMA) to parse inputs and generate responses.
  • Task-driven autonomous agent: given a single starting objective, agents can generates a sequence of actions to achieve it and execute it.
  • Visual avatar: uses Live2D Cubism for animated characters
  • Voice input: uses OpenAI's Whisper model to accept voice inputs
  • Voice output: uses the VITS model for voice output
  • Communication with multiple agents: multiple agent instances can communicate with other agents.
  • External actions: agents can interact with the outside world by executing actions, e.g. search on Google, send an email, etc. This allows you to build plugins to extend the agent's capabilities.
  • Time travelling, branching and logging: as the agent prompts, states and actions are logged as pure JSON objects and vectors, time-travelling and branching is possible.
  • Data source insights: you can supply data sources that the agent can look into (e.g. SQL databases. Agents can create embeddings in vector databases. Hook into CrawlGPT.

Tech Stack



  • Data consumption and intake
    • Fine-tuning the model
    • Using vector databases by creating embeddings
  • Integrates into Poom's Personal Dashboard
  • We need to create APIs for building and registering commands and actions
  • Might just integrate Langchain as it's easier than rolling my own
  • Challenge: runs entirely on the browser (?)
  • Creates multiple agents for each purposes (e.g. Yuuka agent for finance management)

