Integrated Circuit Design Knowledge Graph

A multi-repo temporal knowledge graph that harmonizes structured integrated circuit (IC) hardware design (RTL/Verilog), temporal version history (Git), and unstructured technical specifications (GraphRAG) into a single, queryable ArangoDB graph. The current implementation covers four open-source OpenRISC/RISC-V processors — OR1200, IBEX, MOR1KX, and Marocchino — with cross-repo similarity detection, design epoch analysis, and semantic bridging across all repositories.

[Schema diagram — generated by running the ETL pipeline and opening the graph in ArangoDB Visualizer]

Research Foundations

This project is a modern implementation of the principles established in the Design Knowledge Management System (DKMS) research program co-authored for the Air Force Materiel Command (1989-1992). It realizes the vision of a "Semantic Bridge" between design intent and implementation that was pioneered in these foundational reports.

For details on the theoretical foundations, see docs/research/DKMS_Foundations.md.

Visualizing the Knowledge Graph

Global Schema

The knowledge graph harmonizes three disparate data silos: RTL code structure, Git history, and technical specifications.

[Schema diagram — generated by running docs/project/SCHEMA.md Mermaid diagram or viewing in ArangoDB Visualizer]

The Semantic Bridge

The core value of this project is the Semantic Bridge, which connects unstructured documentation (GraphRAG) to structured hardware implementation (RTL). Below is a visualization from the ArangoDB Graph Visualizer showing a Documentation Entity (center) resolved to multiple RTL Modules (the "Flip-Flop" logic block hierarchy).

[Semantic Bridge visualization — generated by opening IC_Temporal_Knowledge_Graph in ArangoDB Visualizer and running the "Show Entity Resolutions" canvas action]

Key Features

Multi-Repo Temporal Analysis: Ingests four processor repositories, tracking ~6,400 modules, ~3,800 commits, and 381 design epochs across their full Git histories.
Semantic Bridge: Automatically links Verilog modules, ports, and signals to entities referenced in corresponding documentation sections using lexical analysis. 193 RESOLVED_TO edges span all repositories.
Cross-Repo Similarity & Evolution: Detects structurally similar modules across repositories (61 CROSS_REPO_SIMILAR_TO edges) and tracks how designs co-evolve.
Design Epoch & Situation Detection: Groups commits into temporal epochs (DesignEpoch) and identifies 721 design situations (DesignSituation) — refactors, interface changes, complexity shifts — across all repos.
High-Performance Consolidation: Uses set-based AQL operations for near-instant (sub-second) entity resolution across thousands of documentation nodes.
Author Expertise Mapping: First-class contributor vertices enable knowledge transfer, collaboration analysis, and bus factor assessment across all ingested repositories.
Granular RTL Graph: Decomposes monolithic Verilog files into a rich graph of Module, Port, Signal, and LogicChunk nodes.
GraphRAG Augmented: Integrated with entity and community extraction via a local GraphRAG pipeline (src/local_graphrag/) or the Arango AI team's AMP-hosted pipeline.

GraphRAG Status

GraphRAG entity and community extraction is available through two paths:

Local GraphRAG pipeline (src/local_graphrag/) — runs locally without cloud dependencies, suitable for development and demos.
ArangoDB AMP (cloud) — requires the GenAI services feature; used for large-scale or production imports via src/etl_graphrag.py.

GraphRAG collections use per-repo prefixes: OR1200_Entities, IBEX_Entities, MOR1KX_Entities, MAROCCHINO_Entities (and corresponding *_Golden_Entities, *_Relations, etc.). These are present in the demo database for all four processors.

What works without GraphRAG:

Full RTL parsing and graph construction across all repos
Git history ingestion and author expertise mapping
Temporal epoch and situation detection
Semantic bridging between RTL elements and documentation entities (reads existing *_Golden_Entities)
Cross-repo similarity detection
All AQL queries and visualizations in the demo

What requires ArangoDB AMP + GraphRAG:

Re-importing or refreshing document entities from PDFs (src/etl_graphrag.py)
Running the Importer/Retriever services via the GenAI API

See GRAPHRAG_STATUS.md for a detailed description of the integration, known issues, and instructions for attempting a fresh import.

Project Structure

src/: Core ETL, bridging, and analysis scripts.
- local_graphrag/: Local GraphRAG entity/community extraction pipeline.
scripts/multi_repo/: Multi-repo ingestion (ingest_repo.py) and registry (repo_registry.yaml).
scripts/temporal/: Temporal ETL pipeline (epochs, situations, evolution edges).
scripts/setup/: Database creation, visualizer theme, and demo query installation.
data/temporal/: Temporal data artifacts.
docs/: Comprehensive documentation (see docs/README.md)
- project/: Core project docs (Walkthrough, Schema, TEMPORAL_IMPLEMENTATION.md)
- reference/: Technical references
tests/: 213 unit tests for parsing, normalization, and pipeline logic.
validation/: Ground truth datasets and validation scripts.

Setup & Usage

1. Prerequisites

Python 3.10+
ArangoDB instance (local Docker or remote)
Cluster users: if you see collection shards spread across many DB-Servers (one shard per collection, different leaders), graph-heavy queries pay extra network cost. See docs/arangodb-cluster-sharding.md for OneShard vs SmartGraph, scripts/setup/create_oneshard_database.py (new DB), and scripts/setup/migrate_to_oneshard.sh (dump → drop → OneShard → restore).

2. Environment Configuration

Copy env.template to .env in the root directory and configure your settings:

cp env.template .env

Then edit .env with your specific values:

# Choose LOCAL or REMOTE mode
ARANGO_MODE=LOCAL

# For REMOTE mode, configure these:
ARANGO_ENDPOINT=https://your-instance.arango.ai
ARANGO_USERNAME=root
ARANGO_PASSWORD=your_password
ARANGO_DATABASE=ic-knowledge-graph-temporal

# For LOCAL mode (Docker), configure these:
LOCAL_ARANGO_ENDPOINT=http://localhost:8530
LOCAL_ARANGO_USERNAME=root
LOCAL_ARANGO_PASSWORD=
LOCAL_ARANGO_DATABASE=ic-knowledge-graph-temporal

# GraphRAG prefix for collection names (per-repo)
# OR1200_, IBEX_, MOR1KX_, MAROCCHINO_ — set to match the target repo
GRAPHRAG_PREFIX=OR1200_

3. Install Dependencies

pip install -r requirements-core.txt

Key Dependencies:

arango-entity-resolution==3.1.0 - Official PyPI package for entity resolution
- Provides WeightedFieldSimilarity for multi-field scoring (name + description)
- Lazy loading ensures fast startup times
- No manual configuration required

Optional (GraphRAG/document processing):

pip install -r requirements.txt

3b. Install agentic graph analytics (required for analytics reports)

This repo runs analytics via the agentic-graph-analytics project. Install from source (editable):

cd ~/code/agentic-graph-analytics
git pull origin main
pip install -e .

Ensure .env has valid ArangoDB credentials—the workflow uses JWT for GRAL; tokens expire during long runs and are auto-refreshed using ARANGO_ENDPOINT, ARANGO_USER (or ARANGO_USERNAME), and ARANGO_PASSWORD.

4. Running the Pipeline

Full rebuild (recommended):

./scripts/rebuild_database.sh

Or step-by-step:

python scripts/multi_repo/ingest_repo.py             # Ingest all four repos (default)
python scripts/temporal/create_temporal_graph.py     # Create named graph (28 edge definitions)
python src/situation_detector.py --all               # Detect design situations
python src/rtl_semantic_bridge.py --all              # Build RESOLVED_TO edges
python src/cross_repo_bridge.py --all                # Build cross-repo similarity edges

Author Expertise Mapping (included in rebuild):

Extracts contributor expertise from Git history across all ingested repositories
Creates AUTHORED edges (author -> commit)
Creates MAINTAINS edges (author -> module) based on commit frequency
Enables expertise queries, bus factor analysis, and collaboration networks

5. Verification

Run the test suite to ensure the environment is correctly configured:

pytest tests/

Customer hands-on workflow (numbered databases)

Customers can explore the preloaded demo database ic-knowledge-graph-temporal in read-only mode, then create their own numbered sandbox database ic-knowledge-graph-1, ic-knowledge-graph-2, … for hands-on exercises.

See docs/CUSTOMER_EXERCISE_WORKFLOW.md for the step-by-step process (UI-primary DB creation, GraphRAG UI import, and one-command setup).

Agentic analytics (reports)

Once your ArangoDB database is populated (pipeline above), run:

python run_ic_analysis.py

Reports are written to ic_analysis_output/ as both Markdown and interactive HTML.

Visualization

The "Semantic Bridge" can be explored visually via the ArangoDB Dashboard:

Go to Graphs -> IC_Temporal_Knowledge_Graph.
Identify cross-model links: (RTL_Module) -[RESOLVED_TO]-> (*_Golden_Entities) and (RTL_Module) -[CROSS_REPO_SIMILAR_TO]-> (RTL_Module).

Demo Materials

Complete demonstration materials are available:

Full Setup: Run ./scripts/rebuild_database.sh to create the database and ingest all repos
Quick Start: Read docs/DEMO_EXECUTIVE_SUMMARY.md (5-minute overview)
Setup Theme: Run python scripts/setup/install_ic_theme.py to install the 'hardware-design' visualization theme
Setup Queries: Run python scripts/setup/install_demo_setup.py to install 24 saved queries and canvas actions
Demo Guide: Follow docs/TEMPORAL_DEMO_SCRIPT.md for a comprehensive demonstration
Preparation: Use docs/DEMO_README.md for setup checklist and troubleshooting

The demo showcases:

Multi-repo semantic bridges (spec -> code across four processors)
Temporal design audit (epoch-based time-travel queries)
Cross-repo similarity and evolution detection
Design situation analysis (refactors, interface changes, complexity shifts)
Type-safe entity resolution via arango-entity-resolution
Sub-200ms graph traversals
Agent integration for 10x token savings

For technical details, see the Project Walkthrough.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.cursor		.cursor
.github/workflows		.github/workflows
data		data
docs		docs
or1200 @ 09f7535		or1200 @ 09f7535
scripts		scripts
src		src
tests		tests
validation		validation
.gitignore		.gitignore
.gitmodules		.gitmodules
CURSOR_HANDOFF.md		CURSOR_HANDOFF.md
GRAPHRAG_STATUS.md		GRAPHRAG_STATUS.md
HARDENING_AND_VERIFICATION.md		HARDENING_AND_VERIFICATION.md
README.md		README.md
business-requirements.md		business-requirements.md
conftest.py		conftest.py
env.template		env.template
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements-core.txt		requirements-core.txt
requirements-graphrag.txt		requirements-graphrag.txt
requirements.txt		requirements.txt
run_ic_analysis.py		run_ic_analysis.py
validate_integration.py		validate_integration.py
validate_quality.py		validate_quality.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Integrated Circuit Design Knowledge Graph

Research Foundations

Visualizing the Knowledge Graph

Global Schema

The Semantic Bridge

Key Features

GraphRAG Status

Project Structure

Setup & Usage

1. Prerequisites

2. Environment Configuration

3. Install Dependencies

3b. Install agentic graph analytics (required for analytics reports)

4. Running the Pipeline

5. Verification

Customer hands-on workflow (numbered databases)

Agentic analytics (reports)

Visualization

Demo Materials

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Integrated Circuit Design Knowledge Graph

Research Foundations

Visualizing the Knowledge Graph

Global Schema

The Semantic Bridge

Key Features

GraphRAG Status

Project Structure

Setup & Usage

1. Prerequisites

2. Environment Configuration

3. Install Dependencies

3b. Install agentic graph analytics (required for analytics reports)

4. Running the Pipeline

5. Verification

Customer hands-on workflow (numbered databases)

Agentic analytics (reports)

Visualization

Demo Materials

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages