Skip to content

Commit fc220e0

Browse files
committed
feat(neo4j): namespace all node labels (Py*) and relationship types (PY_*)
In a shared Neo4j instance, unprefixed labels and relationship types from different language analyzers collide: `MERGE (:Application {name})` and `:Symbol`/`HAS_MODULE` from a future Java/JS backend would fuse with Python's. Labels and relationship types are separate Neo4j namespaces, so both are prefixed — every node label gets `Py` (e.g. `:PyClass`, shared MERGE label `:PySymbol`) and every relationship type gets `PY_` (e.g. `PY_CALLS`). Constraint/index names are also globally unique per-DB, so they get a `py_` prefix too. - catalog.py: the source-of-truth labels, merge labels, and rel types - schema.py: DDL label refs + constraint/index names - project.py, cypher.py, bolt.py, rows.py: emitter + both writers - tests, sample app, README, CHANGELOG, --app-name help, schema.neo4j.json - neo4j-schema.drawio: new property-graph diagram; schema-uml.drawio: relayout SCHEMA_VERSION stays 1.0.0 (the schema is new on this branch — no released consumer has seen the unprefixed 1.0.0).
1 parent 67b3ba7 commit fc220e0

15 files changed

Lines changed: 432 additions & 273 deletions

CHANGELOG.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
88
## [0.2.0] - 2026-06-20
99

1010
### Added
11-
- **Neo4j property-graph output** (`--emit neo4j`). The same in-memory analysis (`PyApplication`) is projected to a labeled property graph, mirroring the `codeanalyzer-typescript` backend. Two writers:
11+
- **Neo4j property-graph output** (`--emit neo4j`). The same in-memory analysis (`PyApplication`) is projected to a labeled property graph, mirroring the `codeanalyzer-typescript` backend. Node labels are `Py`-prefixed and relationship types are `PY_`-prefixed (e.g. `:PyClass`, `PY_CALLS`) so multiple language analyzers can coexist in one database without label or relationship-type collisions. Two writers:
1212
- **`graph.cypher` snapshot** (default) — a self-contained Cypher script (constraints + indexes, a scoped wipe of the project's prior subgraph, then batched `UNWIND … MERGE`). Load it with `cypher-shell < graph.cypher`. Needs no extra dependencies.
1313
- **Live Bolt push** (`--neo4j-uri`) — an **incremental** writer: only modules whose `content_hash` changed are rewritten, and on a full run modules whose source file vanished are pruned. Requires the optional `neo4j` driver (`pip install 'codeanalyzer-python[neo4j]'`).
14-
- **`--emit schema`** — emit the machine-readable, version-stamped Neo4j schema contract (`schema.json`: node labels, relationships, properties, constraints, indexes). Needs no project; bundled in every release as a GitHub Release asset and checked in as `schema.neo4j.json`. A `schema_version` (`1.0.0`) is stamped onto every graph's `:Application` node.
14+
- **`--emit schema`** — emit the machine-readable, version-stamped Neo4j schema contract (`schema.json`: node labels, relationships, properties, constraints, indexes). Needs no project; bundled in every release as a GitHub Release asset and checked in as `schema.neo4j.json`. A `schema_version` (`1.0.0`) is stamped onto every graph's `:PyApplication` node.
1515
- **New CLI options** mirroring the TypeScript analyzer's entrypoints: `--emit {json,neo4j,schema}`, `--app-name`, `--neo4j-uri`, `--neo4j-user`, `--neo4j-password`, `--neo4j-database`. `-i/--input` is now optional (not required for `--emit schema`). The four Neo4j connection options also read from the standard `NEO4J_URI` / `NEO4J_USERNAME` / `NEO4J_PASSWORD` / `NEO4J_DATABASE` environment variables when the flag is omitted (an explicit flag wins), so the password need not appear in shell history or the process list.
1616
- **`codeanalyzer.neo4j`** package: `catalog` (the single source-of-truth schema catalog), `project` (pure IR → graph rows), `cypher` (snapshot writer), `bolt` (incremental writer), and `rows` (the output-agnostic intermediate).
1717
- **Schema conformance test** (`test/test_neo4j_schema.py`, always runs) — asserts the emitter never produces a label/relationship/property the catalog doesn't declare, and that the checked-in `schema.neo4j.json` is regenerated.

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ To view the available options and commands, run `codeanalyzer --help`. You shoul
9494
│ --format -f [json|msgpack] Output format for --emit json: json or msgpack. [default: json] │
9595
│ --emit [json|neo4j| Output target: json (analysis.json) | neo4j (graph.cypher or live │
9696
│ schema] Bolt push) | schema (the Neo4j schema.json contract). [default: json]│
97-
│ --app-name TEXT Logical application name for the graph :Application anchor. │
97+
│ --app-name TEXT Logical application name for the graph :PyApplication anchor. │
9898
│ --neo4j-uri TEXT Push the graph to a live Neo4j over Bolt. [env: NEO4J_URI] │
9999
│ --neo4j-user TEXT Neo4j username. [env: NEO4J_USERNAME] [default: neo4j] │
100100
│ --neo4j-password TEXT Neo4j password. [env: NEO4J_PASSWORD] [default: neo4j] │
@@ -176,12 +176,12 @@ By default this is printed to stdout in JSON; with `--output` it is written to `
176176
177177
### Neo4j graph
178178
179-
`--emit neo4j` projects the same analysis into a labeled property graph (declarations keyed by their signature under a shared `:Symbol` label; calls, imports, inheritance, decorators, and call sites as relationships):
179+
`--emit neo4j` projects the same analysis into a labeled property graph. Every node label is `Py`-prefixed and every relationship type is `PY_`-prefixed (e.g. `:PyClass`, `PY_CALLS`) so multiple language analyzers can share one database without label or relationship-type collisions. Declarations are keyed by their signature under a shared `:PySymbol` label; calls, imports, inheritance, decorators, and call sites are relationships:
180180
181181
- **Without `--neo4j-uri`** — writes a self-contained `graph.cypher` (constraints + indexes, a scoped wipe, then batched `MERGE`s). Load it with `cypher-shell < graph.cypher`. Needs no extra dependencies.
182-
- **With `--neo4j-uri`** — pushes to a live Neo4j over Bolt **incrementally**: only modules whose content hash changed are rewritten, and on a full run modules whose source file vanished are pruned. Requires the `neo4j` extra. Every graph carries a `schema_version` on its `:Application` node.
182+
- **With `--neo4j-uri`** — pushes to a live Neo4j over Bolt **incrementally**: only modules whose content hash changed are rewritten, and on a full run modules whose source file vanished are pruned. Requires the `neo4j` extra. Every graph carries a `schema_version` on its `:PyApplication` node.
183183
184-
Call-graph endpoints that aren't present in the symbol table (third-party / framework / RPC targets) are materialized as `:External` ghost nodes, mirroring the analyzer's own ghost-node behaviour.
184+
Call-graph endpoints that aren't present in the symbol table (third-party / framework / RPC targets) are materialized as `:PyExternal` ghost nodes, mirroring the analyzer's own ghost-node behaviour.
185185
186186
The connection options also read from the standard Neo4j environment variables — `NEO4J_URI`, `NEO4J_USERNAME`, `NEO4J_PASSWORD`, `NEO4J_DATABASE` — when the corresponding flag is omitted (an explicit flag wins). Prefer the env var for the password so it doesn't land in shell history or the process list:
187187

codeanalyzer/__main__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ def main(
4545
Optional[str],
4646
typer.Option(
4747
"--app-name",
48-
help="Logical application name for the graph :Application anchor "
48+
help="Logical application name for the graph :PyApplication anchor "
4949
"(default: input dir name).",
5050
),
5151
] = None,

codeanalyzer/neo4j/bolt.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@
2929
3030
Nodes are MERGE-upserted, never blindly deleted, so a declaration another
3131
(unchanged) module still references survives and its incoming edges stay valid.
32-
``:External`` / ``:Package`` / ``:Decorator`` are shared (no ``_module``) and are
32+
``:PyExternal`` / ``:PyPackage`` / ``:PyDecorator`` are shared (no ``_module``) and are
3333
MERGE-only.
3434
3535
The ``neo4j`` driver is imported lazily so it stays an optional dependency and
@@ -44,7 +44,7 @@
4444
from codeanalyzer.neo4j.schema import CONSTRAINTS, INDEXES
4545
from codeanalyzer.utils import logger
4646

47-
DESCENDANTS = "[:DECLARES|HAS_METHOD|HAS_ATTRIBUTE|DECLARES_VAR|HAS_CALLSITE*1..]"
47+
DESCENDANTS = "[:PY_DECLARES|PY_HAS_METHOD|PY_HAS_ATTRIBUTE|PY_DECLARES_VAR|PY_HAS_CALLSITE*1..]"
4848
BATCH = 1000
4949

5050

@@ -92,7 +92,7 @@ def session():
9292
# 2. diff content_hash.
9393
db_hash: Dict[str, Optional[str]] = {}
9494
with session() as s:
95-
res = s.run("MATCH (m:Module) RETURN m.file_key AS k, m.content_hash AS h")
95+
res = s.run("MATCH (m:PyModule) RETURN m.file_key AS k, m.content_hash AS h")
9696
for rec in res:
9797
db_hash[rec["k"]] = rec["h"]
9898
changed = set()
@@ -139,7 +139,7 @@ def _purge(tx, module=m, node_keys=keys):
139139
present = list(by_module.keys())
140140
with session() as s:
141141
res = s.run(
142-
"MATCH (m:Module) WHERE NOT m.file_key IN $present "
142+
"MATCH (m:PyModule) WHERE NOT m.file_key IN $present "
143143
f"OPTIONAL MATCH (m)-{DESCENDANTS}->(x) DETACH DELETE x, m "
144144
"RETURN count(m) AS pruned",
145145
present=present,
@@ -210,7 +210,7 @@ def _upsert_edges(session, neo4j, edges: List[EdgeRow]) -> None:
210210

211211
def _hash_of(nodes: List[NodeRow], file_key: str) -> Optional[str]:
212212
for n in nodes:
213-
if n.labels[0] == "Module" and n.value == file_key:
213+
if n.labels[0] == "PyModule" and n.value == file_key:
214214
h = n.props.get("content_hash")
215215
return h if isinstance(h, str) else None
216216
return None

codeanalyzer/neo4j/catalog.py

Lines changed: 36 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@
2424
2525
SCHEMA_VERSION is the contract version: bump MAJOR on a breaking change
2626
(renamed/removed label, relationship or key), MINOR on an additive change (new
27-
label/rel/property). It is stamped onto the ``:Application`` node of every
27+
label/rel/property). It is stamped onto the ``:PyApplication`` node of every
2828
emitted graph so any consumer can detect a producer/consumer mismatch at runtime.
2929
"""
3030
from __future__ import annotations
@@ -63,14 +63,14 @@ class RelType:
6363

6464
NODE_LABELS: List[NodeLabel] = [
6565
NodeLabel(
66-
"Application",
67-
"Application",
66+
"PyApplication",
67+
"PyApplication",
6868
"name",
6969
{"name": "string", "schema_version": "string"},
7070
),
7171
NodeLabel(
72-
"Module",
73-
"Module",
72+
"PyModule",
73+
"PyModule",
7474
"file_key",
7575
{
7676
"file_key": "string",
@@ -82,8 +82,8 @@ class RelType:
8282
},
8383
),
8484
NodeLabel(
85-
"Class",
86-
"Symbol",
85+
"PyClass",
86+
"PySymbol",
8787
"signature",
8888
{
8989
"signature": "string",
@@ -96,8 +96,8 @@ class RelType:
9696
},
9797
),
9898
NodeLabel(
99-
"Callable",
100-
"Symbol",
99+
"PyCallable",
100+
"PySymbol",
101101
"signature",
102102
{
103103
"signature": "string",
@@ -116,21 +116,21 @@ class RelType:
116116
},
117117
),
118118
NodeLabel(
119-
"External",
120-
"Symbol",
119+
"PyExternal",
120+
"PySymbol",
121121
"signature",
122122
{"signature": "string", "name": "string"},
123123
),
124-
NodeLabel("Package", "Package", "name", {"name": "string"}),
124+
NodeLabel("PyPackage", "PyPackage", "name", {"name": "string"}),
125125
NodeLabel(
126-
"Decorator",
127-
"Decorator",
126+
"PyDecorator",
127+
"PyDecorator",
128128
"name",
129129
{"name": "string"},
130130
),
131131
NodeLabel(
132-
"CallSite",
133-
"CallSite",
132+
"PyCallSite",
133+
"PyCallSite",
134134
"id",
135135
{
136136
"id": "string",
@@ -149,8 +149,8 @@ class RelType:
149149
},
150150
),
151151
NodeLabel(
152-
"Attribute",
153-
"Attribute",
152+
"PyAttribute",
153+
"PyAttribute",
154154
"id",
155155
{
156156
"id": "string",
@@ -162,8 +162,8 @@ class RelType:
162162
},
163163
),
164164
NodeLabel(
165-
"Variable",
166-
"Variable",
165+
"PyVariable",
166+
"PyVariable",
167167
"id",
168168
{
169169
"id": "string",
@@ -177,31 +177,31 @@ class RelType:
177177
),
178178
]
179179

180-
_DECL_TARGETS = ["Class", "Callable"]
180+
_DECL_TARGETS = ["PyClass", "PyCallable"]
181181

182182

183183
REL_TYPES: List[RelType] = [
184-
RelType("HAS_MODULE", ["Application"], ["Module"]),
185-
RelType("DECLARES", ["Module", "Class", "Callable"], _DECL_TARGETS),
186-
RelType("HAS_METHOD", ["Class"], ["Callable"]),
187-
RelType("HAS_ATTRIBUTE", ["Class"], ["Attribute"]),
188-
RelType("DECLARES_VAR", ["Module", "Callable"], ["Variable"]),
189-
RelType("HAS_CALLSITE", ["Callable"], ["CallSite"]),
190-
RelType("RESOLVES_TO", ["CallSite"], ["Callable", "External"]),
184+
RelType("PY_HAS_MODULE", ["PyApplication"], ["PyModule"]),
185+
RelType("PY_DECLARES", ["PyModule", "PyClass", "PyCallable"], _DECL_TARGETS),
186+
RelType("PY_HAS_METHOD", ["PyClass"], ["PyCallable"]),
187+
RelType("PY_HAS_ATTRIBUTE", ["PyClass"], ["PyAttribute"]),
188+
RelType("PY_DECLARES_VAR", ["PyModule", "PyCallable"], ["PyVariable"]),
189+
RelType("PY_HAS_CALLSITE", ["PyCallable"], ["PyCallSite"]),
190+
RelType("PY_RESOLVES_TO", ["PyCallSite"], ["PyCallable", "PyExternal"]),
191191
RelType(
192-
"CALLS",
193-
["Callable", "External"],
194-
["Callable", "External"],
192+
"PY_CALLS",
193+
["PyCallable", "PyExternal"],
194+
["PyCallable", "PyExternal"],
195195
{"weight": "integer", "provenance": "string[]"},
196196
),
197-
RelType("EXTENDS", ["Class"], ["Class"]),
197+
RelType("PY_EXTENDS", ["PyClass"], ["PyClass"]),
198198
RelType(
199-
"IMPORTS",
200-
["Module"],
201-
["Package"],
199+
"PY_IMPORTS",
200+
["PyModule"],
201+
["PyPackage"],
202202
{"imported_names": "string[]", "aliases": "string[]"},
203203
),
204-
RelType("DECORATED_BY", ["Callable"], ["Decorator"]),
204+
RelType("PY_DECORATED_BY", ["PyCallable"], ["PyDecorator"]),
205205
]
206206

207207

codeanalyzer/neo4j/cypher.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -69,9 +69,9 @@ def _wipe(app_name: str) -> str:
6969
name = cypher_value(app_name)
7070
return "\n".join(
7171
[
72-
f"MATCH (a:Application {{name: {name}}})",
73-
"OPTIONAL MATCH (a)-[:HAS_MODULE]->(m:Module)",
74-
"OPTIONAL MATCH (m)-[:DECLARES|HAS_METHOD|HAS_ATTRIBUTE|DECLARES_VAR|HAS_CALLSITE*1..]->(x)",
72+
f"MATCH (a:PyApplication {{name: {name}}})",
73+
"OPTIONAL MATCH (a)-[:PY_HAS_MODULE]->(m:PyModule)",
74+
"OPTIONAL MATCH (m)-[:PY_DECLARES|PY_HAS_METHOD|PY_HAS_ATTRIBUTE|PY_DECLARES_VAR|PY_HAS_CALLSITE*1..]->(x)",
7575
"DETACH DELETE x, m, a;",
7676
]
7777
)

0 commit comments

Comments
 (0)