Skip to content

Commit 2bda2d1

Browse files
committed
feat: add PyPI packaging that ships the native binary as a wheel (codajv)
Adds a hatchling-based Python distribution under pypi/ that wraps the prebuilt, JVM-free GraalVM native binary so codeanalyzer is installable with a plain `pip install codeanalyzer-java` (console command `codajv`), no JDK/GraalVM or build step required. - pyproject.toml: dist name codeanalyzer-java; version read from ../gradle.properties via a custom metadata hook so the wheel stays in lockstep with the binary. - hatch_build.py: build hook force-includes the native binary plus the JDK jmods the binary needs at runtime (WALA primordial scope + JmodTypeSolver), marks the wheel non-purelib, and stamps a concrete py3-none-<platform> tag so pip resolves the right artifact per OS/arch instead of a universal wheel. - codeanalyzer_java/__main__.py: locates the bundled binary, points CODEANALYZER_JMODS_DIR at the bundled jmods, and execs the binary (subprocess on Windows); falls back to the gradle output + JAVA_HOME in a source checkout. - Bundle 79/83 jmods to fit under PyPI's 100 MB limit, dropping only jdk.localedata/compiler/internal.vm.compiler/hotspot.agent (locale data + javac/Graal/SA internals that static analysis never needs). CODEANALYZER_BUNDLE_ALL_JMODS=1 bundles all 83. Verified: installed wheel runs JVM-free and is byte-identical to `java -jar` (full jmods) across call-graph-test, record-class-test, init-blocks-test, plantsbywebsphere (CRUD 38=38), and daytrader8.
1 parent fee2161 commit 2bda2d1

6 files changed

Lines changed: 392 additions & 0 deletions

File tree

pypi/.gitignore

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# Build outputs
2+
/dist/
3+
/build/
4+
*.egg-info/
5+
6+
# Native binary + jmods are injected at build time, never committed.
7+
codeanalyzer_java/_vendor/
8+
9+
# Python caches
10+
__pycache__/
11+
*.py[cod]

pypi/README.md

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
![logo](https://raw.githubusercontent.com/codellm-devkit/codeanalyzer-java/main/docs/assets/logo.png)
2+
3+
Native WALA implementation of source code analysis tool for Enterprise Java Applications.
4+
5+
## 1. Installing `codeanalyzer`
6+
7+
`codeanalyzer` ships as a self-contained, JVM-free native binary. No JDK, no
8+
GraalVM, and no build step are required — just install from PyPI:
9+
10+
```bash
11+
pip install codeanalyzer-java
12+
```
13+
14+
This installs the `codajv` command, which runs the bundled native binary
15+
(`pip install` automatically selects the wheel matching your OS/architecture).
16+
17+
## 2. Using `codeanalyzer`
18+
19+
```help
20+
Usage: codajv [-hvV] [--no-build] [-a=<analysisLevel>] [-b=<build>]
21+
[-i=<input>] [-o=<output>] [-s=<sourceAnalysis>]
22+
Convert java binary into a comprehensive system dependency graph.
23+
-i, --input=<input> Path to the project root directory.
24+
-s, --source-analysis=<sourceAnalysis>
25+
Analyze a single string of java source code instead
26+
the project.
27+
-o, --output=<output> Destination directory to save the output graphs. By
28+
default, the SDG formatted as a JSON will be
29+
printed to the console.
30+
-b, --build-cmd=<build> Custom build command. Defaults to auto build.
31+
--no-build Do not build your application. Use this option if
32+
you have already built your application.
33+
-a, --analysis-level=<analysisLevel>
34+
Level of analysis to perform. Options: 1 (for just
35+
symbol table) or 2 (for call graph). Default: 1
36+
-v, --verbose Print logs to console.
37+
-h, --help Show this help message and exit.
38+
-V, --version Print version information and exit.
39+
-t, --target-files For each file user wants to perform source analysis on top of existing analysis.json
40+
41+
```
42+
43+
For example, to analyze a project and print the system dependency graph to the
44+
console:
45+
46+
```sh
47+
codajv -i /path/to/java/project
48+
```
49+
50+
Pass `-o <dir>` to save the output as JSON. Explore the other flags above to
51+
control the analysis level and build behavior.
52+
53+
## LICENSE
54+
55+
```LICENSE
56+
Copyright IBM Corporation 2023, 2024
57+
58+
Licensed under the Apache Public License 2.0, Version 2.0 (the "License");
59+
you may not use this file except in compliance with the License.
60+
61+
Unless required by applicable law or agreed to in writing, software
62+
distributed under the License is distributed on an "AS IS" BASIS,
63+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
64+
See the License for the specific language governing permissions and
65+
limitations under the License.
66+
```

pypi/codeanalyzer_java/__init__.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
"""Python wrapper that ships and runs the codeanalyzer native binary."""
2+
3+
from __future__ import annotations
4+
5+
try:
6+
from importlib.metadata import PackageNotFoundError, version
7+
8+
try:
9+
__version__ = version("codeanalyzer-java")
10+
except PackageNotFoundError: # running from a source checkout
11+
__version__ = "0.0.0"
12+
except Exception: # pragma: no cover - importlib.metadata always present on 3.9+
13+
__version__ = "0.0.0"
14+
15+
__all__ = ["__version__"]

pypi/codeanalyzer_java/__main__.py

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
"""Console entry point (``codajv``) for the bundled codeanalyzer native binary.
2+
3+
The wheel ships a prebuilt, JVM-free GraalVM native image together with the JDK
4+
``.jmod`` files it needs at runtime: both WALA's primordial scope (analysis
5+
level 2) and the JavaParser bytecode symbol solver read the jmods straight off
6+
disk, so they cannot be baked into the image. This module locates those bundled
7+
assets, points ``CODEANALYZER_JMODS_DIR`` at the bundled jmods, and hands off to
8+
the native binary.
9+
"""
10+
11+
from __future__ import annotations
12+
13+
import os
14+
import sys
15+
from pathlib import Path
16+
17+
_PKG_DIR = Path(__file__).resolve().parent
18+
_VENDOR = _PKG_DIR / "_vendor"
19+
# pypi/codeanalyzer_java/ -> pypi/ -> repo root (dev/source-checkout fallback).
20+
_REPO_ROOT = _PKG_DIR.parent.parent
21+
22+
23+
def _find_binary() -> Path | None:
24+
candidates: list[Path] = []
25+
override = os.environ.get("CODEANALYZER_NATIVE_BINARY")
26+
if override:
27+
candidates.append(Path(override))
28+
candidates += [
29+
_VENDOR / "bin" / "codeanalyzer",
30+
_VENDOR / "bin" / "codeanalyzer.exe",
31+
_REPO_ROOT / "build" / "native" / "nativeCompile" / "codeanalyzer",
32+
_REPO_ROOT / "build" / "native" / "nativeCompile" / "codeanalyzer.exe",
33+
]
34+
return next((c for c in candidates if c.is_file()), None)
35+
36+
37+
def _find_jmods() -> Path | None:
38+
bundled = _VENDOR / "jmods"
39+
if bundled.is_dir() and any(bundled.glob("*.jmod")):
40+
return bundled
41+
override = os.environ.get("CODEANALYZER_JMODS_DIR")
42+
if override and Path(override).is_dir():
43+
return Path(override)
44+
java_home = os.environ.get("JAVA_HOME")
45+
if java_home:
46+
jmods = Path(java_home) / "jmods"
47+
if jmods.is_dir():
48+
return jmods
49+
return None
50+
51+
52+
def main() -> int:
53+
binary = _find_binary()
54+
if binary is None:
55+
sys.stderr.write(
56+
"codajv: could not find a codeanalyzer native binary.\n"
57+
"This usually means the installed wheel does not match your "
58+
"platform/architecture, or you are running from a source checkout "
59+
"without a built binary (run `./gradlew nativeCompile`).\n"
60+
)
61+
return 1
62+
63+
env = dict(os.environ)
64+
jmods = _find_jmods()
65+
if jmods is not None:
66+
env["CODEANALYZER_JMODS_DIR"] = str(jmods)
67+
68+
# Wheel installation does not reliably preserve the executable bit.
69+
try:
70+
os.chmod(binary, binary.stat().st_mode | 0o111)
71+
except OSError:
72+
pass
73+
74+
argv = [str(binary), *sys.argv[1:]]
75+
if os.name == "posix":
76+
# Replace this process so signals and the exit code pass through cleanly.
77+
os.execve(str(binary), argv, env)
78+
return 127 # unreachable when execve succeeds
79+
80+
import subprocess
81+
82+
return subprocess.run(argv, env=env).returncode
83+
84+
85+
if __name__ == "__main__":
86+
raise SystemExit(main())

pypi/hatch_build.py

Lines changed: 158 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,158 @@
1+
"""Local hatchling plugins for the codeanalyzer-java wheel.
2+
3+
Two concerns live here:
4+
5+
1. **Version lockstep.** The wheel version is read from the Java project's
6+
``gradle.properties`` so the Python package and the native binary it ships
7+
can never drift apart.
8+
9+
2. **Impure, platform-tagged wheels.** The wheel carries a prebuilt native
10+
binary plus the JDK ``.jmod`` files it needs at runtime. The build hook
11+
force-includes those assets, marks the wheel non-purelib, and stamps a
12+
concrete ``py3-none-<platform>`` tag so pip resolves the correct artifact
13+
per OS/arch instead of a universal ``py3-none-any`` wheel.
14+
"""
15+
16+
from __future__ import annotations
17+
18+
import os
19+
import sysconfig
20+
from pathlib import Path
21+
22+
from hatchling.builders.hooks.plugin.interface import BuildHookInterface
23+
from hatchling.metadata.plugin.interface import MetadataHookInterface
24+
25+
_BINARY_NAMES = ("codeanalyzer", "codeanalyzer.exe")
26+
27+
# Bundling every JDK jmod (~104 MB) pushes the wheel over PyPI's default 100 MB
28+
# per-file limit, and the native binary does not compress further (~23.5 MB),
29+
# leaving room for ~80 MB of jmods. We therefore drop only the largest modules
30+
# that static type resolution never needs, keeping 79 of 83 (~90 MB wheel):
31+
# - jdk.localedata locale resource *data*, not API types
32+
# - jdk.compiler com.sun.tools.javac.* internals (annotation-
33+
# processor sources are the only mild compromise)
34+
# - jdk.internal.vm.compiler Graal compiler internals, never referenced
35+
# - jdk.hotspot.agent Serviceability Agent internals, never referenced
36+
# Set CODEANALYZER_BUNDLE_ALL_JMODS=1 to bundle all 83 (needs a PyPI size bump).
37+
_EXCLUDED_JMODS = frozenset(
38+
{
39+
"jdk.localedata.jmod",
40+
"jdk.compiler.jmod",
41+
"jdk.internal.vm.compiler.jmod",
42+
"jdk.hotspot.agent.jmod",
43+
}
44+
)
45+
46+
47+
def _bundle_all_jmods() -> bool:
48+
return os.environ.get("CODEANALYZER_BUNDLE_ALL_JMODS", "").lower() in {"1", "true", "yes"}
49+
50+
51+
def _select_jmods(jmod_files: list[Path]) -> list[Path]:
52+
if _bundle_all_jmods():
53+
return jmod_files
54+
return [jmod for jmod in jmod_files if jmod.name not in _EXCLUDED_JMODS]
55+
56+
57+
def read_gradle_version(repo_root: Path) -> str:
58+
"""Return the ``version=`` value from ``<repo_root>/gradle.properties``."""
59+
gradle_properties = repo_root / "gradle.properties"
60+
for line in gradle_properties.read_text(encoding="utf-8").splitlines():
61+
line = line.strip()
62+
if line.startswith("version="):
63+
return line.split("=", 1)[1].strip()
64+
raise RuntimeError(f"no 'version=' entry found in {gradle_properties}")
65+
66+
67+
def _wheel_platform_tag() -> str:
68+
"""Concrete wheel tag for the current platform, e.g. ``py3-none-linux_x86_64``.
69+
70+
Linux wheels are emitted with the plain ``linux_*`` platform; CI runs
71+
``auditwheel repair`` to relabel them to a manylinux/musllinux policy.
72+
"""
73+
platform = sysconfig.get_platform().replace("-", "_").replace(".", "_")
74+
return f"py3-none-{platform}"
75+
76+
77+
def _resolve_binary(repo_root: Path) -> Path:
78+
override = os.environ.get("CODEANALYZER_NATIVE_BINARY")
79+
if override:
80+
candidate = Path(override)
81+
if candidate.is_file():
82+
return candidate
83+
raise RuntimeError(
84+
f"CODEANALYZER_NATIVE_BINARY is set to '{override}' but no file exists there."
85+
)
86+
native_dir = repo_root / "build" / "native" / "nativeCompile"
87+
for name in _BINARY_NAMES:
88+
candidate = native_dir / name
89+
if candidate.is_file():
90+
return candidate
91+
raise RuntimeError(
92+
"no prebuilt codeanalyzer native binary found for this platform/arch.\n"
93+
f"Looked for {_BINARY_NAMES} under {native_dir}.\n"
94+
"Build it first with `./gradlew nativeCompile`, or point "
95+
"CODEANALYZER_NATIVE_BINARY at an existing binary. "
96+
"codeanalyzer-java ships only prebuilt wheels; there is no from-source "
97+
"build path for unsupported platforms."
98+
)
99+
100+
101+
def _resolve_jmods() -> Path:
102+
override = os.environ.get("CODEANALYZER_JMODS_DIR")
103+
if override:
104+
candidate = Path(override)
105+
if candidate.is_dir():
106+
return candidate
107+
raise RuntimeError(
108+
f"CODEANALYZER_JMODS_DIR is set to '{override}' but it is not a directory."
109+
)
110+
java_home = os.environ.get("JAVA_HOME")
111+
if java_home:
112+
candidate = Path(java_home) / "jmods"
113+
if candidate.is_dir():
114+
return candidate
115+
raise RuntimeError(
116+
"could not locate JDK .jmod files to bundle. Set CODEANALYZER_JMODS_DIR "
117+
"to a directory of .jmod files, or JAVA_HOME to a JDK that has a jmods/ "
118+
"directory (a JDK 9+ image, not a JRE)."
119+
)
120+
121+
122+
class CustomMetadataHook(MetadataHookInterface):
123+
"""Inject the version read from gradle.properties."""
124+
125+
def update(self, metadata: dict) -> None:
126+
metadata["version"] = read_gradle_version(Path(self.root).parent)
127+
128+
129+
class CustomBuildHook(BuildHookInterface):
130+
"""Bundle the native binary + jmods and force an impure platform wheel."""
131+
132+
def initialize(self, version: str, build_data: dict) -> None:
133+
if self.target_name != "wheel":
134+
return
135+
136+
repo_root = Path(self.root).parent
137+
binary = _resolve_binary(repo_root)
138+
jmods_dir = _resolve_jmods()
139+
jmod_files = sorted(jmods_dir.glob("*.jmod"))
140+
if not jmod_files:
141+
raise RuntimeError(f"no .jmod files found in {jmods_dir}")
142+
selected = _select_jmods(jmod_files)
143+
if not selected:
144+
raise RuntimeError(f"jmod selection is empty (from {jmods_dir})")
145+
146+
force_include = build_data["force_include"]
147+
force_include[str(binary)] = f"codeanalyzer_java/_vendor/bin/{binary.name}"
148+
for jmod in selected:
149+
force_include[str(jmod)] = f"codeanalyzer_java/_vendor/jmods/{jmod.name}"
150+
151+
build_data["pure_python"] = False
152+
build_data["infer_tag"] = False
153+
build_data["tag"] = _wheel_platform_tag()
154+
155+
self.app.display_info(
156+
f"codeanalyzer-java: bundling {binary.name} + {len(selected)}/"
157+
f"{len(jmod_files)} jmods as {build_data['tag']}"
158+
)

pypi/pyproject.toml

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
[build-system]
2+
requires = ["hatchling"]
3+
build-backend = "hatchling.build"
4+
5+
[project]
6+
name = "codeanalyzer-java"
7+
dynamic = ["version"]
8+
description = "Static analysis for Java, shipped as a self-contained native binary (no JVM required)."
9+
readme = "README.md"
10+
requires-python = ">=3.9"
11+
license = "Apache-2.0"
12+
authors = [{ name = "Rahul Krishna", email = "i.m.ralk@gmail.com" }]
13+
keywords = ["java", "static-analysis", "wala", "javaparser", "call-graph", "codeanalyzer"]
14+
classifiers = [
15+
"Development Status :: 4 - Beta",
16+
"Intended Audience :: Developers",
17+
"Programming Language :: Java",
18+
"Programming Language :: Python :: 3",
19+
"Topic :: Software Development :: Libraries",
20+
"Topic :: Software Development :: Quality Assurance",
21+
]
22+
23+
[project.urls]
24+
Homepage = "https://github.com/codellm-devkit/codeanalyzer-java"
25+
Issues = "https://github.com/codellm-devkit/codeanalyzer-java/issues"
26+
Source = "https://github.com/codellm-devkit/codeanalyzer-java"
27+
28+
[project.scripts]
29+
codajv = "codeanalyzer_java.__main__:main"
30+
31+
# Version is read from ../gradle.properties by CustomMetadataHook so the wheel
32+
# stays in lockstep with the native binary.
33+
[tool.hatch.metadata.hooks.custom]
34+
path = "hatch_build.py"
35+
36+
# Bundles the native binary + jmods and forces an impure, platform-tagged wheel.
37+
[tool.hatch.build.targets.wheel.hooks.custom]
38+
path = "hatch_build.py"
39+
40+
[tool.hatch.build.targets.wheel]
41+
packages = ["codeanalyzer_java"]
42+
# The native binary and jmods are injected at build time via force_include in
43+
# hatch_build.py; nothing under _vendor/ is tracked in source control.
44+
exclude = ["codeanalyzer_java/_vendor"]
45+
46+
[tool.hatch.build.targets.sdist]
47+
# The sdist carries only the wrapper sources. Building a wheel from it requires
48+
# a prebuilt native binary (CODEANALYZER_NATIVE_BINARY) and jmods; without them
49+
# the build hook fails with a clear message rather than silently shipping a
50+
# binary-less, non-functional wheel.
51+
include = [
52+
"codeanalyzer_java",
53+
"hatch_build.py",
54+
"pyproject.toml",
55+
"README.md",
56+
]

0 commit comments

Comments
 (0)