A multi-repo temporal knowledge graph that harmonizes structured integrated circuit (IC) hardware design (RTL/Verilog), temporal version history (Git), and unstructured technical specifications (GraphRAG) into a single, queryable ArangoDB graph. The current implementation covers four open-source OpenRISC/RISC-V processors — OR1200, IBEX, MOR1KX, and Marocchino — with cross-repo similarity detection, design epoch analysis, and semantic bridging across all repositories.
[Schema diagram — generated by running the ETL pipeline and opening the graph in ArangoDB Visualizer]
This project is a modern implementation of the principles established in the Design Knowledge Management System (DKMS) research program co-authored for the Air Force Materiel Command (1989-1992). It realizes the vision of a "Semantic Bridge" between design intent and implementation that was pioneered in these foundational reports.
For details on the theoretical foundations, see docs/research/DKMS_Foundations.md.
The knowledge graph harmonizes three disparate data silos: RTL code structure, Git history, and technical specifications.
[Schema diagram — generated by running docs/project/SCHEMA.md Mermaid diagram or viewing in ArangoDB Visualizer]
The core value of this project is the Semantic Bridge, which connects unstructured documentation (GraphRAG) to structured hardware implementation (RTL). Below is a visualization from the ArangoDB Graph Visualizer showing a Documentation Entity (center) resolved to multiple RTL Modules (the "Flip-Flop" logic block hierarchy).
[Semantic Bridge visualization — generated by opening IC_Temporal_Knowledge_Graph in ArangoDB Visualizer and running the "Show Entity Resolutions" canvas action]
- Multi-Repo Temporal Analysis: Ingests four processor repositories, tracking ~6,400 modules, ~3,800 commits, and 381 design epochs across their full Git histories.
- Semantic Bridge: Automatically links Verilog modules, ports, and signals to entities referenced in corresponding documentation sections using lexical analysis. 193 RESOLVED_TO edges span all repositories.
- Cross-Repo Similarity & Evolution: Detects structurally similar modules across repositories (61 CROSS_REPO_SIMILAR_TO edges) and tracks how designs co-evolve.
- Design Epoch & Situation Detection: Groups commits into temporal epochs (DesignEpoch) and identifies 721 design situations (DesignSituation) — refactors, interface changes, complexity shifts — across all repos.
- High-Performance Consolidation: Uses set-based AQL operations for near-instant (sub-second) entity resolution across thousands of documentation nodes.
- Author Expertise Mapping: First-class contributor vertices enable knowledge transfer, collaboration analysis, and bus factor assessment across all ingested repositories.
- Granular RTL Graph: Decomposes monolithic Verilog files into a rich graph of
Module,Port,Signal, andLogicChunknodes. - GraphRAG Augmented: Integrated with entity and community extraction via a local GraphRAG pipeline (
src/local_graphrag/) or the Arango AI team's AMP-hosted pipeline.
GraphRAG entity and community extraction is available through two paths:
- Local GraphRAG pipeline (
src/local_graphrag/) — runs locally without cloud dependencies, suitable for development and demos. - ArangoDB AMP (cloud) — requires the GenAI services feature; used for large-scale or production imports via
src/etl_graphrag.py.
GraphRAG collections use per-repo prefixes: OR1200_Entities, IBEX_Entities, MOR1KX_Entities, MAROCCHINO_Entities (and corresponding *_Golden_Entities, *_Relations, etc.). These are present in the demo database for all four processors.
What works without GraphRAG:
- Full RTL parsing and graph construction across all repos
- Git history ingestion and author expertise mapping
- Temporal epoch and situation detection
- Semantic bridging between RTL elements and documentation entities (reads existing
*_Golden_Entities) - Cross-repo similarity detection
- All AQL queries and visualizations in the demo
What requires ArangoDB AMP + GraphRAG:
- Re-importing or refreshing document entities from PDFs (
src/etl_graphrag.py) - Running the Importer/Retriever services via the GenAI API
See GRAPHRAG_STATUS.md for a detailed description of the integration, known issues, and instructions for attempting a fresh import.
src/: Core ETL, bridging, and analysis scripts.local_graphrag/: Local GraphRAG entity/community extraction pipeline.
scripts/multi_repo/: Multi-repo ingestion (ingest_repo.py) and registry (repo_registry.yaml).scripts/temporal/: Temporal ETL pipeline (epochs, situations, evolution edges).scripts/setup/: Database creation, visualizer theme, and demo query installation.data/temporal/: Temporal data artifacts.docs/: Comprehensive documentation (see docs/README.md)project/: Core project docs (Walkthrough, Schema, TEMPORAL_IMPLEMENTATION.md)reference/: Technical references
tests/: 213 unit tests for parsing, normalization, and pipeline logic.validation/: Ground truth datasets and validation scripts.
- Python 3.10+
- ArangoDB instance (local Docker or remote)
- Cluster users: if you see collection shards spread across many DB-Servers (one shard per collection, different leaders), graph-heavy queries pay extra network cost. See docs/arangodb-cluster-sharding.md for OneShard vs SmartGraph,
scripts/setup/create_oneshard_database.py(new DB), andscripts/setup/migrate_to_oneshard.sh(dump → drop → OneShard → restore).
Copy env.template to .env in the root directory and configure your settings:
cp env.template .envThen edit .env with your specific values:
# Choose LOCAL or REMOTE mode
ARANGO_MODE=LOCAL
# For REMOTE mode, configure these:
ARANGO_ENDPOINT=https://your-instance.arango.ai
ARANGO_USERNAME=root
ARANGO_PASSWORD=your_password
ARANGO_DATABASE=ic-knowledge-graph-temporal
# For LOCAL mode (Docker), configure these:
LOCAL_ARANGO_ENDPOINT=http://localhost:8530
LOCAL_ARANGO_USERNAME=root
LOCAL_ARANGO_PASSWORD=
LOCAL_ARANGO_DATABASE=ic-knowledge-graph-temporal
# GraphRAG prefix for collection names (per-repo)
# OR1200_, IBEX_, MOR1KX_, MAROCCHINO_ — set to match the target repo
GRAPHRAG_PREFIX=OR1200_pip install -r requirements-core.txtKey Dependencies:
arango-entity-resolution==3.1.0- Official PyPI package for entity resolution- Provides
WeightedFieldSimilarityfor multi-field scoring (name + description) - Lazy loading ensures fast startup times
- No manual configuration required
- Provides
Optional (GraphRAG/document processing):
pip install -r requirements.txtThis repo runs analytics via the agentic-graph-analytics project. Install from source (editable):
cd ~/code/agentic-graph-analytics
git pull origin main
pip install -e .Ensure .env has valid ArangoDB credentials—the workflow uses JWT for GRAL; tokens expire during long runs and are auto-refreshed using ARANGO_ENDPOINT, ARANGO_USER (or ARANGO_USERNAME), and ARANGO_PASSWORD.
Full rebuild (recommended):
./scripts/rebuild_database.shOr step-by-step:
python scripts/multi_repo/ingest_repo.py # Ingest all four repos (default)
python scripts/temporal/create_temporal_graph.py # Create named graph (28 edge definitions)
python src/situation_detector.py --all # Detect design situations
python src/rtl_semantic_bridge.py --all # Build RESOLVED_TO edges
python src/cross_repo_bridge.py --all # Build cross-repo similarity edgesAuthor Expertise Mapping (included in rebuild):
- Extracts contributor expertise from Git history across all ingested repositories
- Creates AUTHORED edges (author -> commit)
- Creates MAINTAINS edges (author -> module) based on commit frequency
- Enables expertise queries, bus factor analysis, and collaboration networks
Run the test suite to ensure the environment is correctly configured:
pytest tests/Customers can explore the preloaded demo database ic-knowledge-graph-temporal in read-only mode, then create their own numbered sandbox database ic-knowledge-graph-1, ic-knowledge-graph-2, … for hands-on exercises.
See docs/CUSTOMER_EXERCISE_WORKFLOW.md for the step-by-step process (UI-primary DB creation, GraphRAG UI import, and one-command setup).
Once your ArangoDB database is populated (pipeline above), run:
python run_ic_analysis.pyReports are written to ic_analysis_output/ as both Markdown and interactive HTML.
The "Semantic Bridge" can be explored visually via the ArangoDB Dashboard:
- Go to Graphs -> IC_Temporal_Knowledge_Graph.
- Identify cross-model links:
(RTL_Module) -[RESOLVED_TO]-> (*_Golden_Entities)and(RTL_Module) -[CROSS_REPO_SIMILAR_TO]-> (RTL_Module).
Complete demonstration materials are available:
- Full Setup: Run
./scripts/rebuild_database.shto create the database and ingest all repos - Quick Start: Read
docs/DEMO_EXECUTIVE_SUMMARY.md(5-minute overview) - Setup Theme: Run
python scripts/setup/install_ic_theme.pyto install the 'hardware-design' visualization theme - Setup Queries: Run
python scripts/setup/install_demo_setup.pyto install 24 saved queries and canvas actions - Demo Guide: Follow
docs/TEMPORAL_DEMO_SCRIPT.mdfor a comprehensive demonstration - Preparation: Use
docs/DEMO_README.mdfor setup checklist and troubleshooting
The demo showcases:
- Multi-repo semantic bridges (spec -> code across four processors)
- Temporal design audit (epoch-based time-travel queries)
- Cross-repo similarity and evolution detection
- Design situation analysis (refactors, interface changes, complexity shifts)
- Type-safe entity resolution via
arango-entity-resolution - Sub-200ms graph traversals
- Agent integration for 10x token savings
For technical details, see the Project Walkthrough.