Quick Start Β· What is this? Β· Full Setup Β· How to Use Β· Architecture Β· Research
Author: Trilochan Sharma β Independent Researcher Β· @parnish007 Paper:
research/contextforge_v2_final.tex(v2.3) Β· Zenodo: doi.org/10.5281/zenodo.19784778 Β· Benchmark: 990 tests Β· Ξ¦ = 79.7%
Every AI coding session starts completely blank.
Decisions made last week, architectural tradeoffs, why that library was chosen β all gone. You paste CLAUDE.md summaries, hit token limits, and watch the same mistakes repeat across sessions.
ContextForge solves this with a persistent, queryable knowledge graph. Your IDE's AI calls load_context and gets exactly the decisions relevant to the current task β nothing more, nothing less.
| π§ Persistent Memory Decisions survive session restarts |
β‘ 93% Token Savings 1,050 tokens vs 14,000 for 200 decisions |
π‘οΈ Adversarial Guard 3-pass entropy gate blocks prompt injection |
π Zero Cloud Cost Runs fully offline with Ollama |
3 steps. Under 2 minutes. No API keys required.
# 1. Clone and install
git clone https://github.com/parnish007/contextforge.git
cd contextforge
pip install -r requirements.txt
# 2. Configure (works fully offline out of the box)
cp .env.example .env
# 3. Start the MCP server
python mcp/server.py --stdio # Stdio β Claude Desktop / Cursor / VS Code
# python mcp/server.py --sse --host 0.0.0.0 --port 8765 # SSE β remote / multi-clientAdd to your IDE's MCP config:
{
"mcpServers": {
"contextforge": {
"command": "python",
"args": ["mcp/server.py", "--stdio"],
"cwd": "/absolute/path/to/contextforge",
"env": { "DB_PATH": "data/contextforge.db" }
}
}
}IDE-specific configs for Claude Desktop, Cursor, VS Code, Windsurf β
docs/SETUP.md
Want 100% local, zero internet?
# Install Ollama, pull a model, then:
FALLBACK_CHAIN=ollama python mcp/server.py --stdioThe MCP server stores and retrieves decisions with no LLM. API keys only unlock the optional 8-agent
python main.pyloop β not core functionality.
| Feature | What it means | |
|---|---|---|
| π§ | Persistent decisions | capture_decision stores why a choice was made. load_context surfaces it next session. |
| π | Semantic search | search_context and get_knowledge_node find decisions by meaning, not just keywords. |
| βͺ | Time-travel rollback | rollback undoes any write. snapshot / replay_sync restore full state from encrypted .forge files. |
| π‘οΈ | Adversarial write guard | 3-pass entropy gate + ReviewerGuard blocks prompt-injection before it corrupts memory. |
| β‘ | 93% token savings | 200 decisions: CLAUDE.md paste = 14,000 tokens. load_context = 1,050. Budget is configurable. |
22 MCP tools total β project management, decision graph, tasks, ledger, and sync.
Full reference β docs/HOW_TO_USE.md
xychart-beta
title "Memory Integrity Score β Suite 15 v2 (higher = better)"
x-axis ["StatelessRAG", "LangGraph", "MemGPT", "ClaudeMem", "HardenedRAG", "ContextForge v3"]
y-axis "MIS (0-1)" 0 --> 1
bar [0.417, 0.549, 0.574, 0.595, 0.753, 0.801]
| System | Recall@3 | Update Acc | Delete Acc | Poison Res | MIS |
|---|---|---|---|---|---|
| StatelessRAG | 0.000 | 0.000 | 0.667 | 1.000 | 0.417 |
| LangGraph | 0.967 | 0.229 | 1.000 | 0.000 | 0.549 |
| MemGPT | 0.867 | 0.429 | 1.000 | 0.000 | 0.574 |
| ClaudeMem | 0.867 | 0.429 | 1.000 | 0.086 | 0.595 |
| HardenedRAG | 0.983 | 0.229 | 1.000 | 0.800 | 0.753 |
| ContextForge v3 | 0.833 | 0.600 | 1.000 | 0.771 | 0.801 |
MIS = mean(Recall@3, UpdateAccuracy, DeleteAccuracy, PoisonResistance). Recency-weighted BM25 (Ξ»=0.0001 sβ»ΒΉ) raised update accuracy from 0.229 β 0.600 (+37.1 pp).
xychart-beta
title "Adversarial Block Rate (%) β Suite 14, n=300 samples"
x-axis ["Stateless RAG", "MemGPT", "LangGraph", "HardenedRAG", "CF v3 (deployed)", "CF Paper Mode"]
y-axis "ABR (%)" 0 --> 100
bar [0, 0, 0, 71, 55, 90]
| System | ABR β | FPR β | Precision |
|---|---|---|---|
| Stateless RAG / MemGPT / LangGraph | 0% | 0% | β |
| HardenedRAG | 71% | 5% | 58.6% |
| ContextForge v3 (deployed) | 55% | 1% | 98.2% |
| ContextForge (paper mode, research) | 90% | 25% | β |
quadrantChart
title FPR vs Adversarial Block Rate Trade-off
x-axis Low FPR --> High FPR
y-axis Low protection --> High protection
quadrant-1 Research only
quadrant-2 Ideal zone
quadrant-3 No protection
quadrant-4 Noisy ineffective
ContextForge v3: [0.04, 0.55]
HardenedRAG: [0.19, 0.71]
CF Paper Mode: [0.62, 0.90]
StatelessRAG: [0.02, 0.02]
| Dimension | Stateless RAG | ContextForge | Delta |
|---|---|---|---|
| Adversarial block rate (paper mode) | 0.0% | 90.0% | +90.0 pp |
| Mean failover latency | 480.0 ms | 149.5 ms | β68.9% |
| Token noise reduction | 0% | 87.4% | +87.4 pp |
| TNR (true negative rate) | 0.0% | 70.2% | +70.2 pp |
| OMEGA-75 benchmark pass rate | 68.3% | 100.0% | +31.7 pp |
| Composite Safety Index Ξ¦ | β | 79.7% | β |
| Decisions stored | CLAUDE.md paste | ContextForge load_context |
Savings |
|---|---|---|---|
| 20 | 3,000 tokens | 700 tokens | 77% |
| 100 | 8,000 tokens | 1,050 tokens | 87% |
| 200 | 14,000 tokens | 1,050 tokens | 93% |
Token budget is configurable β
CONTEXT_BUDGET_MODE:fixed/adaptive/model_awareFull comparison βdocs/WHAT_IS_THIS.md
flowchart LR
IDE["IDE / AI Client\nClaude Β· Cursor Β· VS Code Β· Windsurf"]
subgraph MCP ["MCP Server β mcp/server.py"]
T["22 tools\nStdio + SSE/HTTP"]
end
subgraph Nexus ["Nexus Architecture β src/"]
M["Memory Ledger\nAppend-only Β· ReviewerGuard Β· Rollback"]
RET["DCI Retrieval\nRecency-weighted BM25 Β· Token budget"]
R["Router\nGroq β Gemini β Ollama\nCircuit breaker"]
S["Sync\nAES-256-GCM .forge snapshots"]
end
DB[("SQLite\ndata/contextforge.db")]
LLM["LLM Providers\nGroq Β· Gemini Β· Ollama"]
IDE -->|"JSON-RPC"| T
T --> M & RET & R & S
M & RET & S --> DB
R -->|"HTTP"| LLM
| Pillar | Module | Role |
|---|---|---|
| Transport | src/transport/server.py |
Dual-mode MCP: Stdio + SSE/HTTP |
| Router | src/router/nexus_router.py |
Tri-core LLM failover + circuit breaker |
| Memory | src/memory/ledger.py |
Append-only event ledger + ReviewerGuard + rollback |
| Retrieval | src/retrieval/jit_librarian.py |
Recency-weighted DCI RAG, zero cloud tokens |
| Sync | src/sync/fluid_sync.py |
AES-256-GCM snapshots + 15-min idle checkpoint |
Deep-dive β docs/ARCHITECTURE.md
Every write to the knowledge graph passes three independent checks:
flowchart LR
INPUT["Incoming write\n(decision / event)"]
P0["Pass 0\nEntropy gate\nH <= H* + LZ density"]
P1["Pass 1\nDestructive verb\nregex (22 patterns)"]
P2["Pass 2\nCharter keyword\noverlap score"]
OK["Stored"]
BLOCK["BLOCKED"]
INPUT --> P0
P0 -->|"fails"| BLOCK
P0 -->|"passes"| P1
P1 -->|"fails"| BLOCK
P1 -->|"passes"| P2
P2 -->|"fails"| BLOCK
P2 -->|"passes"| OK
style BLOCK fill:#C0392B,color:#fff
style OK fill:#1B7837,color:#fff
Two operating modes β switch via CF_MODE in .env:
| Mode | Use case | ABR | FPR |
|---|---|---|---|
experiment |
Production β low false alarms | 55% | 1% |
paper |
Research / air-gap / max security | 90% | 25% |
Engineering details and formal math β docs/ENGINEERING_REFERENCE.md
import asyncio
from src.memory.ledger import EventLedger, EventType
from src.router.nexus_router import get_router
from src.retrieval.jit_librarian import JITLibrarian
from src.sync.fluid_sync import FluidSync
# Append-only memory ledger β ReviewerGuard + entropy gate active
ledger = EventLedger(db_path="data/contextforge.db")
event_id = ledger.append(EventType.AGENT_THOUGHT, {"text": "Use JWT rotation for auth"})
ledger.rollback(event_id) # microsecond-precision time-travel undo
# Tri-core LLM router with circuit breaker
router = get_router()
response = asyncio.run(router.complete(
messages=[{"role": "user", "content": "Summarise the auth module"}],
temperature=0.3,
))
# Recency-weighted DCI retrieval β local-edge, zero cloud tokens
jit = JITLibrarian(project_root=".", token_budget=1500)
context = asyncio.run(jit.get_context("JWT authentication", threshold=0.75))
# AES-256-GCM encrypted snapshot
sync = FluidSync(ledger, snapshot_dir=".forge")
sync.create_snapshot(label="before-refactor")π Project Management (6 tools)
| Tool | Purpose |
|---|---|
list_projects |
List all registered projects |
init_project |
Create or update a project |
rename_project |
Rename a project (keeps project_id slug) |
merge_projects |
Merge one project's data into another |
delete_project |
Delete a project (archives nodes first) |
project_stats |
Node/task/area summary for a project |
π§ Decision Graph (7 tools)
| Tool | Purpose |
|---|---|
capture_decision |
Store a decision with rationale + alternatives (ReviewerGuard checked) |
load_context |
L0/L1/L2 hierarchical context assembly, DCI token budget |
get_knowledge_node |
Keyword search over decisions |
list_decisions |
List decisions with area/status filters |
update_decision |
Update fields on an existing decision |
deprecate_decision |
Mark a decision as superseded |
link_decisions |
Create a typed edge between two decisions |
β Tasks (3 tools)
| Tool | Purpose |
|---|---|
list_tasks |
List tasks for a project |
create_task |
Create a new task |
update_task |
Update task status |
πΎ Ledger & Sync (6 tools)
| Tool | Purpose |
|---|---|
rollback |
Time-travel undo via append-only ledger |
snapshot |
AES-256-GCM encrypted checkpoint |
list_snapshots |
List all .forge snapshot files |
replay_sync |
Cross-device context restore from .forge |
list_events |
Inspect the append-only event ledger |
search_context |
Semantic search over local files β zero cloud tokens |
All 22 tools validated by a real-world coding agent simulator β 8 development scenarios, 150 tool calls, 0.069 ms avg latency:
python -X utf8 benchmark/mcp_agent_sim/run_simulation.py
# β 8/8 scenarios pass Β· 12/12 MCP tools exercised Β· 150 calls Β· 0.069 ms avg latency# OMEGA-75 + extended suites β 375 tests
python -X utf8 benchmark/test_v5/run_all.py
# Individual iteration suites
python -X utf8 benchmark/test_v5/iter_01_core.py # Core Network (4.7 s)
python -X utf8 benchmark/test_v5/iter_02_ledger.py # Temporal Integrity (37.2 s)
python -X utf8 benchmark/test_v5/iter_03_poison.py # Adversarial Guard (5.7 s)
python -X utf8 benchmark/test_v5/iter_04_scale.py # RAG & DCI (6.8 s)
python -X utf8 benchmark/test_v5/iter_05_chaos.py # Heat-Death Chaos (44.6 s)
# Suite 14 β Security benchmark (300 samples x 5 baselines)
python -X utf8 benchmark/suites/suite_14_fpr_fix_eval.py
# Suite 15 v2 β Memory quality (160 samples x 6 systems)
python -X utf8 benchmark/benchmark_memory/scripts/suite_15_memory_eval_v2.py
# MCP coding agent simulator (22 tools x 8 scenarios)
python -X utf8 benchmark/mcp_agent_sim/run_simulation.py
# Dual-pass scientific benchmark β 100 probes x 2 modes
python -X utf8 benchmark/engine.py
# Regenerate all publication figures (300 DPI PNG)
python research/figures/gen_all.py
python research/figures/gen_fpr_fix_figures.py
python benchmark/benchmark_memory/figures/gen_memory_figures_v2.py
python research/figures/gen_security_tradeoff_fig19.py| Feature | Detail |
|---|---|
| β‘ Recency-Weighted BM25 | score = BM25 Γ exp(βλ·age) with Ξ»=0.0001 sβ»ΒΉ. Raises Suite 15 update accuracy from 0.229 β 0.600 (+37.1 pp). Toggle: RECENCY_WEIGHTING_ENABLED |
| π‘οΈ OR-Gate ReviewerGuard | Experiment mode uses Path A (char-level Hβ₯4.8) OR Path B (intent_scoreβ₯0.70). FPR drops from 25% β 1% at 55% ABR. Toggle: CF_MODE=experiment |
| π₯ Suite 15 v2 β #1 Memory Quality | MIS=0.801, first of 6 systems. Previous: MIS=0.742 (before recency fix) |
| π Figure 19 β Security Trade-off Scatter | FPR vs ABR Pareto frontier across all operating points |
| π€ MCP Coding Agent Simulator | benchmark/mcp_agent_sim/ β 8 real-world scenarios, 150 tool calls, full ReviewerGuard adversarial resistance testing |
| π§ͺ 990 total benchmark tests | Up from 530. OMEGA-75Γ5 (375) + Suite 14 (300) + Suite 15 (160) + core (155) |
| Document | Start here if⦠|
|---|---|
docs/WHAT_IS_THIS.md |
You want to understand what this is before installing (36-question FAQ) |
docs/SETUP.md |
You're ready to install β IDE configs, API keys, Ollama, troubleshooting |
docs/HOW_TO_USE.md |
You have it running and want to use it effectively (all 22 tools with workflows) |
docs/ARCHITECTURE.md |
You want to understand or extend the internals |
docs/ENGINEERING_REFERENCE.md |
You want the math β entropy gate derivation, DCI formulas, Ξ¦ definition |
docs/RESEARCH.md |
You're replicating the benchmark methodology |
docs/BENCHMARK_RESULTS.md |
You want per-suite pass/fail tables and novelty claims |
docs/EVOLUTION_LOG.md |
You want to trace the v1βv3 tuning history |
research/RESEARCH.md |
Full research assets index β paper, figures, benchmark archives |
If you use ContextForge in your research, please cite:
@software{sharma_2025_contextforge,
author = {Sharma, Trilochan},
title = {ContextForge: Agentic Memory for AI-Assisted Development},
year = {2026},
publisher = {Zenodo},
doi = {10.5281/zenodo.19784778},
url = {https://doi.org/10.5281/zenodo.19784778}
}| Asset | Description |
|---|---|
research/contextforge_v2_final.tex |
v2.3 paper β honest v3 numbers, Suite 15 v2, Β§5.7 Recency-Weighted Retrieval, Fig 19 |
research/contextforge_v2.tex |
v2.1 paper β extended architecture, Suite 14 FPR-fix section |
research/refs.bib |
Extended bibliography (23 citations) |
research/figures/output/ |
19 data-driven figures (300 DPI PNG) |
results/comparison_table_v3.json |
5-system v3 comparison (Suite 14, 300 samples) |
results/v3_security_summary.json |
v3 OR-gate security metrics (ABR=55%, FPR=1%, F1=0.639) |
benchmark/benchmark_memory/results/suite_15_final_report_v2.json |
Suite 15 v2 full results (MIS=0.801) |
data/academic_metrics.md |
Full ΞS / ΞL / ΞDCI mathematical synthesis |
Contributions, issues, and feature requests are welcome!
- Fork the repo
- Create your branch:
git checkout -b feature/amazing-feature - Commit your changes:
git commit -m 'Add amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request
MIT License β see LICENSE for details.
ContextForge Nexus Architecture β reproducible, information-theoretically grounded agentic memory.
Built by Trilochan Sharma (parnish007)