Description
Currently, the Knowledge Space Agent stores chat history in memory (for example, self.chat_history). While this works for local development and single-session usage, the history is lost when the server restarts or when deployed in stateless environments such as container-based or serverless deployments.
Additionally, repeated queries always trigger fresh LLM calls, which may increase latency and operational cost.
This issue proposes exploring a more scalable and modular approach to handling chat history and caching.
Motivation
- Chat history is not persistent across restarts.
- Stateless deployments may lose session context.
- Repeated or identical queries always invoke the LLM.
- There is currently no caching layer to reduce redundant computation.
Improving this could enhance scalability, reduce cost, and improve reproducibility.
Proposed Solution (Seeking Maintainer Feedback)
1. Persistent Chat History
Introduce a modular storage layer for chat history:
- SQLite for local development
- PostgreSQL for production
- Session-based storage (user/session ID mapping)
The default behavior can remain unchanged (in-memory) unless explicitly enabled via configuration.
2. Optional Caching Layer
Introduce a lightweight caching mechanism:
- Exact-match caching (Redis or in-memory cache)
- Future extension: semantic caching using embeddings
Caching should remain configurable and optional to avoid introducing mandatory infrastructure dependencies.
Scope of Work
- Abstract chat history into a storage interface
- Implement optional database-backed storage
- Add optional caching layer
- Add configuration flags (e.g.,
ENABLE_PERSISTENCE, ENABLE_CACHE)
- Update documentation accordingly
Expected Outcome
- Improved scalability
- Reduced redundant LLM calls
- Better support for stateless deployments
- Optional and backward-compatible implementation
I am happy to adjust the scope based on maintainer feedback before starting implementation.
Description
Currently, the Knowledge Space Agent stores chat history in memory (for example,
self.chat_history). While this works for local development and single-session usage, the history is lost when the server restarts or when deployed in stateless environments such as container-based or serverless deployments.Additionally, repeated queries always trigger fresh LLM calls, which may increase latency and operational cost.
This issue proposes exploring a more scalable and modular approach to handling chat history and caching.
Motivation
Improving this could enhance scalability, reduce cost, and improve reproducibility.
Proposed Solution (Seeking Maintainer Feedback)
1. Persistent Chat History
Introduce a modular storage layer for chat history:
The default behavior can remain unchanged (in-memory) unless explicitly enabled via configuration.
2. Optional Caching Layer
Introduce a lightweight caching mechanism:
Caching should remain configurable and optional to avoid introducing mandatory infrastructure dependencies.
Scope of Work
ENABLE_PERSISTENCE,ENABLE_CACHE)Expected Outcome
I am happy to adjust the scope based on maintainer feedback before starting implementation.