Skip to content

Commit 1121fb2

Browse files
committed
feat(neo4j): add Neo4j graph output, schema contract, and Bolt writer
Port the codeanalyzer-typescript 0.4.0 Neo4j feature to Java with the same arg entrypoints: --emit json|neo4j|schema (default json) --app-name, --neo4j-uri, --neo4j-user, --neo4j-password, --neo4j-database New com.ibm.cldk.neo4j package: - GraphProjector: pure projection of the symbol table (+ level-2 call graph) to graph rows. Type/Callable share a :Symbol identity; call sites, fields, parameters, variables, enum constants, record components are first-class nodes; annotations/packages are shared; entrypoints are a marker label; every unit-owned node carries a _unit provenance prop. - CypherWriter: self-contained graph.cypher snapshot (constraints, scoped wipe, batched UNWIND/MERGE). - BoltWriter: live incremental push over Bolt — diffs each compilation unit's content_hash, replaces only changed units (idempotent MERGE), prunes vanished units on a full run. Uses neo4j-java-driver 4.4.x (JDK 11/native). - SchemaCatalog + Schema: the in-repo graph contract (labels, relationships, typed properties, DDL); --emit schema serializes it to schema.json. Tests: - Neo4jSchemaConformanceTest (no container): anti-drift guard asserting the projector never emits a label/rel/property the catalog doesn't declare, and that schema.neo4j.json is current. - Neo4jBoltWriterTest (opt-in, Testcontainers Neo4j): full push, idempotent re-push, and orphan pruning against a real database. Runs only when RUN_CONTAINER_TESTS is set. Docs/release/packaging: - README: install one-liner + Neo4j graph output section + refreshed --help. - release.yml: publish codeanalyzer.jar, schema.json and the installer as release assets, with cargo-dist-style release notes. - packaging/install/codeanalyzer-installer.sh: curl/wget installer that fetches the jar and drops a `codeanalyzer` launcher on PATH. - neo4j-schema.drawio: diagram of the emitted property-graph schema. - schema.neo4j.json: checked-in graph contract. Bump version to 2.4.0.
1 parent 756fc4e commit 1121fb2

18 files changed

Lines changed: 2527 additions & 24 deletions

.github/workflows/release.yml

Lines changed: 52 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,22 @@ jobs:
4040
git push --delete origin ${GITHUB_REF#refs/tags/}
4141
exit 1 # Fail the workflow
4242
43+
- name: Resolve version
44+
id: ver
45+
run: echo "version=${GITHUB_REF#refs/tags/v}" >> "$GITHUB_OUTPUT"
46+
47+
# Stage release assets: a stable-named codeanalyzer.jar (what the installer fetches), the
48+
# Neo4j schema contract (platform-independent, version-locked to this build), and the
49+
# cargo-dist-style install script.
50+
- name: Stage release assets (jar + Neo4j schema + installer)
51+
run: |
52+
mkdir -p release-assets
53+
cp build/libs/codeanalyzer-${{ steps.ver.outputs.version }}.jar release-assets/codeanalyzer.jar
54+
cp build/libs/codeanalyzer-${{ steps.ver.outputs.version }}.jar "release-assets/codeanalyzer-${{ steps.ver.outputs.version }}.jar"
55+
java -jar build/libs/codeanalyzer-${{ steps.ver.outputs.version }}.jar --emit schema > release-assets/schema.json
56+
cp packaging/install/codeanalyzer-installer.sh release-assets/codeanalyzer-installer.sh
57+
ls -lh release-assets
58+
4359
- name: Build Changelog
4460
id: gen_changelog
4561
uses: mikepenz/release-changelog-builder-action@v5
@@ -49,10 +65,44 @@ jobs:
4965
env:
5066
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
5167

68+
# cargo-dist-style release notes: install one-liner + downloads, then the generated changelog.
69+
- name: Compose release notes
70+
id: notes
71+
run: |
72+
{
73+
echo "## Install"
74+
echo
75+
echo '```sh'
76+
echo "curl --proto '=https' --tlsv1.2 -LsSf https://github.com/${GITHUB_REPOSITORY}/releases/download/v${{ steps.ver.outputs.version }}/codeanalyzer-installer.sh | sh"
77+
echo '```'
78+
echo
79+
echo "Or run the JAR directly (requires Java 11+):"
80+
echo
81+
echo '```sh'
82+
echo "java -jar codeanalyzer.jar -i /path/to/project -a 2 --emit neo4j -o ./out # writes out/graph.cypher"
83+
echo '```'
84+
echo
85+
echo "## Downloads"
86+
echo
87+
echo "| Asset | Description |"
88+
echo "| --- | --- |"
89+
echo "| \`codeanalyzer.jar\` | Self-contained analyzer (run with \`java -jar\`) |"
90+
echo "| \`codeanalyzer-installer.sh\` | Installer that fetches the jar and adds a \`codeanalyzer\` launcher |"
91+
echo "| \`schema.json\` | Neo4j graph schema contract (node labels, relationships, DDL) |"
92+
echo
93+
echo "---"
94+
echo
95+
echo "${{ steps.gen_changelog.outputs.changelog }}"
96+
} > release-notes.md
97+
5298
- name: Publish Release
5399
uses: softprops/action-gh-release@v1
54100
with:
55-
files: build/libs/*.jar
56-
body: ${{ steps.gen_changelog.outputs.changelog }}
101+
files: |
102+
release-assets/codeanalyzer.jar
103+
release-assets/codeanalyzer-${{ steps.ver.outputs.version }}.jar
104+
release-assets/schema.json
105+
release-assets/codeanalyzer-installer.sh
106+
body_path: release-notes.md
57107
env:
58108
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

README.md

Lines changed: 103 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,24 @@
22

33
Native WALA implementation of source code analysis tool for Enterprise Java Applications.
44

5+
`codeanalyzer` extracts a comprehensive **symbol table** and **call graph** from Java applications
6+
and emits them either as the canonical `analysis.json`, or as a **Neo4j property graph**
7+
(`--emit neo4j`) — a `graph.cypher` snapshot or a live, incremental push over Bolt. See
8+
[§4. Neo4j graph output](#4-neo4j-graph-output).
9+
10+
## Quick install
11+
12+
Grab the latest release jar and a `codeanalyzer` launcher (requires a Java 11+ runtime):
13+
14+
```sh
15+
curl --proto '=https' --tlsv1.2 -LsSf https://github.com/codellm-devkit/codeanalyzer-java/releases/latest/download/codeanalyzer-installer.sh | sh
16+
# or with wget:
17+
wget -qO- https://github.com/codellm-devkit/codeanalyzer-java/releases/latest/download/codeanalyzer-installer.sh | sh
18+
```
19+
20+
Overrides: `CODEANALYZER_INSTALL_DIR` (default `~/.local/bin`), `CODEANALYZER_VERSION` (default `latest`).
21+
Prefer to build from source? See [§2. Building `codeanalyzer`](#2-building-codeanalyzer).
22+
523
## 1. Prerequisites
624

725
Before you begin, ensure you have met the following requirements:
@@ -68,30 +86,40 @@ Run the Gradle wrapper script to build the project. This will compile the projec
6886

6987
### 2.3. Using `codeanalyzer`
7088

71-
The jar will be built at `build/libs/codeanalyzer-1.0.jar`. It may be used as follows:
89+
The jar will be built at `build/libs/codeanalyzer-<version>.jar`. It may be used as follows:
7290

7391
```help
74-
Usage: java -jar /path/to/codeanalyzer.jar [-hvV] [--no-build] [-a=<analysisLevel>] [-b=<build>]
92+
Usage: codeanalyzer [-hvV] [--no-build] [--no-clean-dependencies]
93+
[-a=<analysisLevel>] [-b=<build>] [-f=<projectRootPom>]
7594
[-i=<input>] [-o=<output>] [-s=<sourceAnalysis>]
76-
Convert java binary into a comprehensive system dependency graph.
77-
-i, --input=<input> Path to the project root directory.
78-
-s, --source-analysis=<sourceAnalysis>
79-
Analyze a single string of java source code instead
80-
the project.
81-
-o, --output=<output> Destination directory to save the output graphs. By
82-
default, the SDG formatted as a JSON will be
83-
printed to the console.
84-
-b, --build-cmd=<build> Custom build command. Defaults to auto build.
85-
--no-build Do not build your application. Use this option if
86-
you have already built your application.
87-
-a, --analysis-level=<analysisLevel>
88-
Level of analysis to perform. Options: 1 (for just
89-
symbol table) or 2 (for call graph). Default: 1
90-
-v, --verbose Print logs to console.
91-
-h, --help Show this help message and exit.
92-
-V, --version Print version information and exit.
93-
-t, --target-files For each file user wants to perform source analysis on top of existing analysis.json
94-
95+
[--emit=<emit>] [--app-name=<appName>]
96+
[--neo4j-uri=<uri>] [--neo4j-user=<user>]
97+
[--neo4j-password=<password>] [--neo4j-database=<db>]
98+
[-t=<targetFiles>]...
99+
Analyze java application.
100+
-i, --input=<input> Path to the project root directory.
101+
-s, --source-analysis=<s> Analyze a single string of java source code instead
102+
of the project.
103+
-o, --output=<output> Destination directory to save the output graphs. By
104+
default, the analysis JSON is printed to the console.
105+
-b, --build-cmd=<build> Custom build command. Defaults to auto build.
106+
--no-build Do not build your application (use if already built).
107+
-a, --analysis-level=<n> Level of analysis: 1 (symbol table) or 2 (call graph).
108+
Default: 1. Level 2 adds CALLS edges to the graph.
109+
-t, --target-files=<f>... Restrict analysis to specific files (incremental).
110+
--emit=<emit> Output target: json (analysis.json, default) |
111+
neo4j (graph.cypher or live Bolt push) |
112+
schema (the Neo4j schema.json contract).
113+
--app-name=<name> Logical application name for the graph :Application
114+
anchor (default: input dir name).
115+
--neo4j-uri=<uri> Push the graph to a live Neo4j over Bolt (incremental);
116+
omit to write graph.cypher.
117+
--neo4j-user=<user> Neo4j username (default: neo4j).
118+
--neo4j-password=<pw> Neo4j password (default: neo4j).
119+
--neo4j-database=<db> Neo4j database name (default: server default).
120+
-v, --verbose Print logs to console.
121+
-h, --help Show this help message and exit.
122+
-V, --version Print version information and exit.
95123
```
96124

97125

@@ -157,6 +185,60 @@ There is a sample application in `src/test/resources/sample_apps/daytrader8/bina
157185

158186
This will produce print the SDG on the console. Explore other flags to save the output to a JSON.
159187

188+
## 4. Neo4j graph output
189+
190+
`codeanalyzer` can project the analysis IR into a [Neo4j](https://neo4j.com/) property graph instead
191+
of `analysis.json`. The graph models the same information — compilation units, types, callables,
192+
fields, parameters, call sites, variables, enum constants, record components, annotations, packages —
193+
as first-class nodes and relationships, and (at `-a 2`) adds `CALLS` edges from the call graph.
194+
195+
The full contract (node labels, their keys and typed properties, relationship types and endpoints,
196+
plus the constraint/index DDL) lives in [`schema.neo4j.json`](./schema.neo4j.json) and is visualized
197+
in [`neo4j-schema.drawio`](./neo4j-schema.drawio). `SCHEMA_VERSION` is stamped onto the
198+
`:Application` node of every emitted graph.
199+
200+
### 4.1. Cypher snapshot (no database required)
201+
202+
```sh
203+
codeanalyzer -i /path/to/project -a 2 --emit neo4j -o ./out
204+
# → writes ./out/graph.cypher (a self-contained, re-runnable script)
205+
cypher-shell -u neo4j -p <password> < ./out/graph.cypher
206+
```
207+
208+
The snapshot is **not** incremental: it constraints, scopes-wipes this application's prior subgraph,
209+
then `UNWIND … MERGE`-loads the full truth.
210+
211+
### 4.2. Live incremental push over Bolt
212+
213+
```sh
214+
codeanalyzer -i /path/to/project -a 2 --emit neo4j \
215+
--neo4j-uri bolt://localhost:7687 --neo4j-user neo4j --neo4j-password <password>
216+
```
217+
218+
The Bolt writer reads the database's current state and updates **only what changed**: it diffs each
219+
compilation unit's `content_hash`, replaces just the changed units' subgraphs (idempotent
220+
`MERGE` upserts), and — on a full run — prunes units whose source file vanished. Combine with
221+
`--target-files` for a targeted, partial re-push (orphan pruning is then skipped).
222+
223+
### 4.3. Schema contract
224+
225+
```sh
226+
codeanalyzer --emit schema -o ./out # → ./out/schema.json (no project analysis needed)
227+
codeanalyzer --emit schema # → prints the contract to stdout
228+
```
229+
230+
### 4.4. Verifying the writers
231+
232+
A no-container conformance test (`Neo4jSchemaConformanceTest`) asserts the projector never emits a
233+
label/relationship/property the catalog doesn't declare, and that `schema.neo4j.json` is current. A
234+
Testcontainers-backed integration test (`Neo4jBoltWriterTest`) spins up a real Neo4j and exercises
235+
the Bolt writer (full push, idempotent re-push, orphan pruning). The container suite is **opt-in**
236+
(it needs Docker/Podman) and runs only when `RUN_CONTAINER_TESTS` is set:
237+
238+
```sh
239+
RUN_CONTAINER_TESTS=1 ./gradlew test
240+
```
241+
160242
## FAQ
161243

162244
1. After making a few code changes, my native binary gives random exceptions. But, my code works perfectly with `java -jar`.

build.gradle

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -124,9 +124,14 @@ dependencies {
124124
implementation('com.github.javaparser:javaparser-symbol-solver-core:3.26.3')
125125
implementation('com.github.javaparser:javaparser-core:3.26.3')
126126

127+
// Neo4j Bolt driver for `--emit neo4j --neo4j-uri ...` (live incremental graph push).
128+
// 4.4.x retains Java 8/11 compatibility for the GraalVM native-image build.
129+
implementation('org.neo4j.driver:neo4j-java-driver:4.4.12')
130+
127131
// TestContainers
128132
testImplementation 'org.testcontainers:testcontainers:1.20.6'
129133
testImplementation 'org.testcontainers:junit-jupiter:1.20.6'
134+
testImplementation 'org.testcontainers:neo4j:1.20.6'
130135

131136
// JUnit 5
132137
testImplementation 'org.junit.jupiter:junit-jupiter-api:5.10.1'

gradle.properties

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
version=2.3.8
1+
version=2.4.0

0 commit comments

Comments
 (0)