Solutions
- A framework for creating virtual assistants for specific purposes.
- Language model integration: uses LLMs (e.g. OpenAI GPT-4, LaMDa, LLaMA) to parse inputs and generate responses.
- Task-driven autonomous agent: given a single starting objective, agents can generates a sequence of actions to achieve it and execute it.
- Visual avatar: uses Live2D Cubism for animated characters
- Voice input: uses OpenAI's Whisper model to accept voice inputs
- Voice output: uses the VITS model for voice output
- Communication with multiple agents: multiple agent instances can communicate with other agents.
- External actions: agents can interact with the outside world by executing actions, e.g. search on Google, send an email, etc. This allows you to build plugins to extend the agent's capabilities.
- Time travelling, branching and logging: as the agent prompts, states and actions are logged as pure JSON objects and vectors, time-travelling and branching is possible.
- Data source insights: you can supply data sources that the agent can look into (e.g. SQL databases. Agents can create embeddings in vector databases. Hook into CrawlGPT.
Tech Stack
Implementations
Notes
- Data consumption and intake
- Fine-tuning the model
- Using vector databases by creating embeddings
- Integrates into Poom's Personal Dashboard
- We need to create APIs for building and registering commands and actions
- Might just integrate Langchain as it's easier than rolling my own
- Challenge: runs entirely on the browser (?)
- Creates multiple agents for each purposes (e.g. Yuuka agent for finance management)
Inspirations