You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: restructure README to match codeanalyzer-typescript; ignore node_modules/.astro
Give the README the same layout as the `cants` (TypeScript) sibling: centered
header + badges, intro, table of contents, Features, sectioned Installation
(Prerequisites / pip / shell script / build-from-source), Usage (Options +
Examples), Output targets (analysis.json / Neo4j / schema), Development, License.
Content is unchanged in substance and the auto-generated `canpy --help` block is
preserved verbatim (scripts/update_readme.py --check passes).
Also add node_modules/ and .astro/ to .gitignore — they are docs-site build
artifacts that should never be committed.
**A Python static-analysis toolkit — the CLDK backend that emits a canonical symbol table and call graph, as `analysis.json` or a Neo4j property graph.**
checked in as `schema.neo4j.json` and shipped with every release.
60
+
-**Incremental cache** — per-file results are cached under `.codeanalyzer`; `--lazy` (default)
61
+
reuses them, `--eager` forces a clean rebuild. `--ray` distributes the work across cores.
62
+
-**Compact output** — canonical `analysis.json`, or binary `analysis.msgpack` for smaller artifacts.
2
63
3
-
#A Python Static Analysis Toolkit (and Library)
64
+
## Installation
4
65
5
-
A comprehensive static analysis tool for Python source code that provides symbol table generation, call graph analysis, and semantic analysis using Jedi, CodeQL, and Tree-sitter — emitted as the canonical `analysis.json`, or projected into a **Neo4j property graph**.
66
+
### Prerequisites
6
67
7
-
## Installation
68
+
-**Python 3.10 or newer.**
69
+
- A C toolchain and the `venv` / development headers — the analyzer builds an isolated virtual
70
+
environment per project (via Python's `venv`) so Jedi can resolve types and imports:
This will save the analysis results in `analysis.msgpack` in the specified directory.
170
-
171
-
3.**Analysis with CodeQL enabled:**
172
-
```bash
206
+
3.**Resolve extra call edges with CodeQL:**
207
+
```sh
173
208
canpy --input ./my-python-project --codeql
174
209
```
175
-
Every run produces a symbol table **and** a call graph. By default, edges come from Jedi's lexical analysis. Adding `--codeql` resolves additional edges (including RPC / third-party / dynamically-dispatched targets) and merges them with the Jedi-derived edges. CodeQL also backfills resolved callees on Jedi-emitted call sites where Jedi couldn't resolve them.
176
-
177
-
***Note: CodeQL integration is experimental. The CLI is downloaded into `<cache_dir>/codeql/` on first use and reused thereafter.***
`canpy` builds one analysis in memory and can emit it three ways (`--emit`):
@@ -210,18 +245,32 @@ A `PyApplication` document — the canonical CLDK contract:
210
245
}
211
246
```
212
247
213
-
By default this is printed to stdout in JSON; with `--output` it is written to `analysis.json` (or `analysis.msgpack` with `--format msgpack`, a more compact binary format).
248
+
By default this is printed to stdout in JSON; with `--output` it is written to `analysis.json` (or
249
+
`analysis.msgpack` with `--format msgpack`, a more compact binary format).
214
250
215
251
### Neo4j graph
216
252
217
-
`--emit neo4j` projects the same analysis into a labeled property graph. Every node label is `Py`-prefixed and every relationship type is `PY_`-prefixed (e.g. `:PyClass`, `PY_CALLS`) so multiple language analyzers can share one database without label or relationship-type collisions. Declarations are keyed by their signature under a shared `:PySymbol` label; calls, imports, inheritance, decorators, and call sites are relationships:
253
+
`--emit neo4j` projects the same analysis into a labeled property graph. Every node label is
254
+
`Py`-prefixed and every relationship type is `PY_`-prefixed (e.g. `:PyClass`, `PY_CALLS`) so multiple
255
+
language analyzers can share one database without label or relationship-type collisions. Declarations
256
+
are keyed by their signature under a shared `:PySymbol` label; calls, imports, inheritance,
257
+
decorators, and call sites are relationships:
218
258
219
-
-**Without `--neo4j-uri`** — writes a self-contained `graph.cypher` (constraints + indexes, a scoped wipe, then batched `MERGE`s). Load it with `cypher-shell < graph.cypher`. Needs no extra dependencies.
220
-
-**With `--neo4j-uri`** — pushes to a live Neo4j over Bolt **incrementally**: only modules whose content hash changed are rewritten, and on a full run modules whose source file vanished are pruned. Requires the `neo4j` extra. Every graph carries a `schema_version` on its `:PyApplication` node.
259
+
-**Without `--neo4j-uri`** — writes a self-contained `graph.cypher` (constraints + indexes, a scoped
260
+
wipe, then batched `MERGE`s). Load it with `cypher-shell < graph.cypher`. Needs no extra
261
+
dependencies.
262
+
-**With `--neo4j-uri`** — pushes to a live Neo4j over Bolt **incrementally**: only modules whose
263
+
content hash changed are rewritten, and on a full run modules whose source file vanished are
264
+
pruned. Requires the `neo4j` extra. Every graph carries a `schema_version` on its `:PyApplication`
265
+
node.
221
266
222
-
Call-graph endpoints that aren't present in the symbol table (third-party / framework / RPC targets) are materialized as `:PyExternal` ghost nodes, mirroring the analyzer's own ghost-node behaviour.
267
+
Call-graph endpoints that aren't present in the symbol table (third-party / framework / RPC targets)
268
+
are materialized as `:PyExternal` ghost nodes, mirroring the analyzer's own ghost-node behaviour.
223
269
224
-
The connection options also read from the standard Neo4j environment variables — `NEO4J_URI`, `NEO4J_USERNAME`, `NEO4J_PASSWORD`, `NEO4J_DATABASE` — when the corresponding flag is omitted (an explicit flag wins). Prefer the env var for the password so it doesn't land in shell history or the process list:
270
+
The connection options also read from the standard Neo4j environment variables — `NEO4J_URI`,
271
+
`NEO4J_USERNAME`, `NEO4J_PASSWORD`, `NEO4J_DATABASE` — when the corresponding flag is omitted (an
272
+
explicit flag wins). Prefer the env var for the password so it doesn't land in shell history or the
273
+
process list:
225
274
226
275
```sh
227
276
export NEO4J_URI=bolt://localhost:7687
@@ -231,59 +280,36 @@ canpy -i ./my-project --emit neo4j # credentials picked up from the environm
231
280
232
281
### Schema contract
233
282
234
-
`--emit schema` writes the machine-readable, version-stamped Neo4j schema (`schema.json`: node labels, relationships, properties, constraints, and indexes). It needs no project and is checked into the repo as `schema.neo4j.json` and bundled in every release as a GitHub Release asset, so a consumer can validate producer/consumer compatibility without invoking the tool. The shape of the contract matches the [`codeanalyzer-typescript`](https://github.com/codellm-devkit/codeanalyzer-typescript) backend.
0 commit comments