Skip to content

Commit f32312a

Browse files
authored
Merge pull request #155 from codellm-devkit/feature/issue-154-neo4j-and-fix-153
feat/neo4j: J-namespaced, lossless Neo4j graph output (#154)
2 parents 756fc4e + 9fc85c4 commit f32312a

23 files changed

Lines changed: 6276 additions & 262 deletions

.github/workflows/release.yml

Lines changed: 52 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,22 @@ jobs:
4040
git push --delete origin ${GITHUB_REF#refs/tags/}
4141
exit 1 # Fail the workflow
4242
43+
- name: Resolve version
44+
id: ver
45+
run: echo "version=${GITHUB_REF#refs/tags/v}" >> "$GITHUB_OUTPUT"
46+
47+
# Stage release assets: a stable-named codeanalyzer.jar (what the installer fetches), the
48+
# Neo4j schema contract (platform-independent, version-locked to this build), and the
49+
# cargo-dist-style install script.
50+
- name: Stage release assets (jar + Neo4j schema + installer)
51+
run: |
52+
mkdir -p release-assets
53+
cp build/libs/codeanalyzer-${{ steps.ver.outputs.version }}.jar release-assets/codeanalyzer.jar
54+
cp build/libs/codeanalyzer-${{ steps.ver.outputs.version }}.jar "release-assets/codeanalyzer-${{ steps.ver.outputs.version }}.jar"
55+
java -jar build/libs/codeanalyzer-${{ steps.ver.outputs.version }}.jar --emit schema > release-assets/schema.neo4j.json
56+
cp packaging/install/codeanalyzer-installer.sh release-assets/codeanalyzer-installer.sh
57+
ls -lh release-assets
58+
4359
- name: Build Changelog
4460
id: gen_changelog
4561
uses: mikepenz/release-changelog-builder-action@v5
@@ -49,10 +65,44 @@ jobs:
4965
env:
5066
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
5167

68+
# cargo-dist-style release notes: install one-liner + downloads, then the generated changelog.
69+
- name: Compose release notes
70+
id: notes
71+
run: |
72+
{
73+
echo "## Install"
74+
echo
75+
echo '```sh'
76+
echo "curl --proto '=https' --tlsv1.2 -LsSf https://github.com/${GITHUB_REPOSITORY}/releases/download/v${{ steps.ver.outputs.version }}/codeanalyzer-installer.sh | sh"
77+
echo '```'
78+
echo
79+
echo "Or run the JAR directly (requires Java 11+):"
80+
echo
81+
echo '```sh'
82+
echo "java -jar codeanalyzer.jar -i /path/to/project -a 2 --emit neo4j -o ./out # writes out/graph.cypher"
83+
echo '```'
84+
echo
85+
echo "## Downloads"
86+
echo
87+
echo "| Asset | Description |"
88+
echo "| --- | --- |"
89+
echo "| \`codeanalyzer.jar\` | Self-contained analyzer (run with \`java -jar\`) |"
90+
echo "| \`codeanalyzer-installer.sh\` | Installer that fetches the jar and adds a \`codeanalyzer\` launcher |"
91+
echo "| \`schema.neo4j.json\` | Neo4j graph schema contract (node labels, relationships, DDL) |"
92+
echo
93+
echo "---"
94+
echo
95+
echo "${{ steps.gen_changelog.outputs.changelog }}"
96+
} > release-notes.md
97+
5298
- name: Publish Release
5399
uses: softprops/action-gh-release@v1
54100
with:
55-
files: build/libs/*.jar
56-
body: ${{ steps.gen_changelog.outputs.changelog }}
101+
files: |
102+
release-assets/codeanalyzer.jar
103+
release-assets/codeanalyzer-${{ steps.ver.outputs.version }}.jar
104+
release-assets/schema.neo4j.json
105+
release-assets/codeanalyzer-installer.sh
106+
body_path: release-notes.md
57107
env:
58108
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

README.md

Lines changed: 111 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,24 @@
22

33
Native WALA implementation of source code analysis tool for Enterprise Java Applications.
44

5+
`codeanalyzer` extracts a comprehensive **symbol table** and **call graph** from Java applications
6+
and emits them either as the canonical `analysis.json`, or as a **Neo4j property graph**
7+
(`--emit neo4j`) — a `graph.cypher` snapshot or a live, incremental push over Bolt. See
8+
[§4. Neo4j graph output](#4-neo4j-graph-output).
9+
10+
## Quick install
11+
12+
Grab the latest release jar and a `codeanalyzer` launcher (requires a Java 11+ runtime):
13+
14+
```sh
15+
curl --proto '=https' --tlsv1.2 -LsSf https://github.com/codellm-devkit/codeanalyzer-java/releases/latest/download/codeanalyzer-installer.sh | sh
16+
# or with wget:
17+
wget -qO- https://github.com/codellm-devkit/codeanalyzer-java/releases/latest/download/codeanalyzer-installer.sh | sh
18+
```
19+
20+
Overrides: `CODEANALYZER_INSTALL_DIR` (default `~/.local/bin`), `CODEANALYZER_VERSION` (default `latest`).
21+
Prefer to build from source? See [§2. Building `codeanalyzer`](#2-building-codeanalyzer).
22+
523
## 1. Prerequisites
624

725
Before you begin, ensure you have met the following requirements:
@@ -68,30 +86,42 @@ Run the Gradle wrapper script to build the project. This will compile the projec
6886

6987
### 2.3. Using `codeanalyzer`
7088

71-
The jar will be built at `build/libs/codeanalyzer-1.0.jar`. It may be used as follows:
89+
The jar will be built at `build/libs/codeanalyzer-<version>.jar`. It may be used as follows:
7290

7391
```help
74-
Usage: java -jar /path/to/codeanalyzer.jar [-hvV] [--no-build] [-a=<analysisLevel>] [-b=<build>]
92+
Usage: codeanalyzer [-hvV] [--no-build] [--no-clean-dependencies]
93+
[-a=<analysisLevel>] [-b=<build>] [-f=<projectRootPom>]
7594
[-i=<input>] [-o=<output>] [-s=<sourceAnalysis>]
76-
Convert java binary into a comprehensive system dependency graph.
77-
-i, --input=<input> Path to the project root directory.
78-
-s, --source-analysis=<sourceAnalysis>
79-
Analyze a single string of java source code instead
80-
the project.
81-
-o, --output=<output> Destination directory to save the output graphs. By
82-
default, the SDG formatted as a JSON will be
83-
printed to the console.
84-
-b, --build-cmd=<build> Custom build command. Defaults to auto build.
85-
--no-build Do not build your application. Use this option if
86-
you have already built your application.
87-
-a, --analysis-level=<analysisLevel>
88-
Level of analysis to perform. Options: 1 (for just
89-
symbol table) or 2 (for call graph). Default: 1
90-
-v, --verbose Print logs to console.
91-
-h, --help Show this help message and exit.
92-
-V, --version Print version information and exit.
93-
-t, --target-files For each file user wants to perform source analysis on top of existing analysis.json
94-
95+
[--emit=<emit>] [--app-name=<appName>]
96+
[--neo4j-uri=<uri>] [--neo4j-user=<user>]
97+
[--neo4j-password=<password>] [--neo4j-database=<db>]
98+
[-t=<targetFiles>]...
99+
Analyze java application.
100+
-i, --input=<input> Path to the project root directory.
101+
-s, --source-analysis=<s> Analyze a single string of java source code instead
102+
of the project.
103+
-o, --output=<output> Destination directory to save the output graphs. By
104+
default, the analysis JSON is printed to the console.
105+
-b, --build-cmd=<build> Custom build command. Defaults to auto build.
106+
--no-build Do not build your application (use if already built).
107+
-a, --analysis-level=<n> Level of analysis: 1 (symbol table) or 2 (call graph).
108+
Default: 1. Level 2 adds J_CALLS edges to the graph.
109+
-t, --target-files=<f>... Restrict analysis to specific files (incremental).
110+
--emit=<emit> Output target: json (analysis.json, default) |
111+
neo4j (graph.cypher or live Bolt push) |
112+
schema (the Neo4j schema.neo4j.json contract).
113+
--app-name=<name> Logical application name for the graph :JApplication
114+
anchor (default: input dir name).
115+
--neo4j-uri=<uri> Push the graph to a live Neo4j over Bolt (incremental);
116+
omit to write graph.cypher. Falls back to the
117+
NEO4J_URI environment variable.
118+
--neo4j-user=<user> Neo4j username (env: NEO4J_USERNAME, default: neo4j).
119+
--neo4j-password=<pw> Neo4j password (env: NEO4J_PASSWORD, default: neo4j).
120+
--neo4j-database=<db> Neo4j database name (env: NEO4J_DATABASE, default:
121+
server default).
122+
-v, --verbose Print logs to console.
123+
-h, --help Show this help message and exit.
124+
-V, --version Print version information and exit.
95125
```
96126

97127

@@ -157,6 +187,66 @@ There is a sample application in `src/test/resources/sample_apps/daytrader8/bina
157187

158188
This will produce print the SDG on the console. Explore other flags to save the output to a JSON.
159189

190+
## 4. Neo4j graph output
191+
192+
`codeanalyzer` can project the analysis IR into a [Neo4j](https://neo4j.com/) property graph instead
193+
of `analysis.json`. The graph is a **lossless** projection of the IR: compilation units, types,
194+
callables, fields, parameters, call sites, variables, enum constants, record components,
195+
initialization blocks, CRUD operations/queries, comments, annotations and packages are all
196+
first-class nodes and relationships, and (at `-a 2`) it adds `J_CALLS` edges from the call graph.
197+
Every field of the Lombok entity model is represented (scalars as node properties — maps such as a
198+
field's per-variable initializers are kept as a `*_json` property since Neo4j has no map type;
199+
comments are `:JComment` nodes in addition to the convenience `docstring` property).
200+
201+
The full contract (node labels, their keys and typed properties, relationship types and endpoints,
202+
plus the constraint/index DDL) lives in [`schema.neo4j.json`](./schema.neo4j.json) and is visualized
203+
in [`neo4j-schema.drawio`](./neo4j-schema.drawio). All node labels are `J`-prefixed and relationship
204+
types `J_`-prefixed (e.g. `:JType`, `:JCallable`, `J_CALLS`) so a Java graph can share a Neo4j
205+
database with the Python (`Py*`/`PY_*`) and TypeScript (`TS*`/`TS_*`) backends without colliding.
206+
`SCHEMA_VERSION` is stamped onto the `:JApplication` node of every emitted graph.
207+
208+
### 4.1. Cypher snapshot (no database required)
209+
210+
```sh
211+
codeanalyzer -i /path/to/project -a 2 --emit neo4j -o ./out
212+
# → writes ./out/graph.cypher (a self-contained, re-runnable script)
213+
cypher-shell -u neo4j -p <password> < ./out/graph.cypher
214+
```
215+
216+
The snapshot is **not** incremental: it constraints, scopes-wipes this application's prior subgraph,
217+
then `UNWIND … MERGE`-loads the full truth.
218+
219+
### 4.2. Live incremental push over Bolt
220+
221+
```sh
222+
codeanalyzer -i /path/to/project -a 2 --emit neo4j \
223+
--neo4j-uri bolt://localhost:7687 --neo4j-user neo4j --neo4j-password <password>
224+
```
225+
226+
The Bolt writer reads the database's current state and updates **only what changed**: it diffs each
227+
compilation unit's `content_hash`, replaces just the changed units' subgraphs (idempotent
228+
`MERGE` upserts), and — on a full run — prunes units whose source file vanished. Combine with
229+
`--target-files` for a targeted, partial re-push (orphan pruning is then skipped).
230+
231+
### 4.3. Schema contract
232+
233+
```sh
234+
codeanalyzer --emit schema -o ./out # → ./out/schema.neo4j.json (no project analysis needed)
235+
codeanalyzer --emit schema # → prints the contract to stdout
236+
```
237+
238+
### 4.4. Verifying the writers
239+
240+
A no-container conformance test (`Neo4jSchemaConformanceTest`) asserts the projector never emits a
241+
label/relationship/property the catalog doesn't declare, and that `schema.neo4j.json` is current. A
242+
Testcontainers-backed integration test (`Neo4jBoltWriterTest`) spins up a real Neo4j and exercises
243+
the Bolt writer (full push, idempotent re-push, orphan pruning). The container suite is **opt-in**
244+
(it needs Docker/Podman) and runs only when `RUN_CONTAINER_TESTS` is set:
245+
246+
```sh
247+
RUN_CONTAINER_TESTS=1 ./gradlew test
248+
```
249+
160250
## FAQ
161251

162252
1. After making a few code changes, my native binary gives random exceptions. But, my code works perfectly with `java -jar`.

build.gradle

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -124,9 +124,17 @@ dependencies {
124124
implementation('com.github.javaparser:javaparser-symbol-solver-core:3.26.3')
125125
implementation('com.github.javaparser:javaparser-core:3.26.3')
126126

127+
// Neo4j Bolt driver for `--emit neo4j --neo4j-uri ...` (live incremental graph push).
128+
// Bundled into the fat jar so `java -jar` supports live push out of the box. It is reached ONLY
129+
// reflectively, via the BoltSink seam (see Neo4jEmitter#loadBoltSink), so the GraalVM native image
130+
// prunes BoltWriter and never compiles the driver + Netty into the binary — the native build stays
131+
// small and Netty-metadata-free, and `--neo4j-uri` there falls back to a graph.cypher snapshot.
132+
implementation('org.neo4j.driver:neo4j-java-driver:4.4.12')
133+
127134
// TestContainers
128135
testImplementation 'org.testcontainers:testcontainers:1.20.6'
129136
testImplementation 'org.testcontainers:junit-jupiter:1.20.6'
137+
testImplementation 'org.testcontainers:neo4j:1.20.6'
130138

131139
// JUnit 5
132140
testImplementation 'org.junit.jupiter:junit-jupiter-api:5.10.1'

gradle.properties

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
version=2.3.8
1+
version=2.4.0

0 commit comments

Comments
 (0)