All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- README Cited By section highlighting papers that cite CLDK (SAINT, ASTER, RECON, PRAXIS, Phaedrus, and others), compiled from Semantic Scholar / OpenAlex citation data.
- Per-language factory methods on
CLDK—CLDK.java(),CLDK.python(),CLDK.typescript(), andCLDK.c()— each with an honest signature exposing only the options that apply to that language. These are the preferred entry points, replacing the stringly-typedCLDK(language).analysis(...). - Typed backend-configuration objects in
cldk.analysis.commons.backend_config. The backend is now selected by the type of thebackend=config passed to a factory:CodeAnalyzerConfig(default; in-process analyzer) /PyCodeAnalyzerConfig(addsuse_codeql,use_ray), orNeo4jConnectionConfig(read-only Neo4j).Neo4jConnectionConfigis hoisted here and re-exported fromcldk.analysis.{python,typescript}.neo4jfor backward compatibility. - Unified, language-keyed cache directory. All backends now share a single
cache_dir(default<project>/.codeanalyzer) and write their artifacts under a per-language subdirectory (<cache_dir>/java,<cache_dir>/python,<cache_dir>/typescript), so a polyglot project analyzed under more than one language no longer overwrites a sharedanalysis.json.
- Caching is on by default for Java/TypeScript. The in-process backend now caches
analysis.jsonto disk (under the language-keyedcache_dir) instead of streaming over a stdout pipe. CLDK(language).analysis(...)is deprecated and retained as a thin compatibility shim that forwards to the new factory methods (emits aDeprecationWarning).
- Java
source_code(single-file) input — passproject_pathinstead.
analysis_backend_pathfrom the public interface. The backend binary ships with the packagedcodeanalyzer-*dependency; for TypeScript,$CODEANALYZER_TS_BINremains as the only out-of-band override.analysis_json_pathfrom the public interface — folded into the unifiedcache_dir.
- The language-keyed cache relocates
analysis.jsonfrom<cache_dir>/analysis.jsonto<cache_dir>/<language>/analysis.json; existing caches are not found at the new path, so the first run after upgrading recomputes the analysis.
- Read-only Neo4j-backed TypeScript analysis backend (
cldk.analysis.typescript.neo4j.TSNeo4jBackend). It is a drop-in alternative to the in-memoryTSCodeanalyzer: it answers the sameget_*query surface (call graph, callers/callees, class hierarchy, call sites, decorators, symbol lookups, ...) by running Cypher over a live Neo4j graph instead of walking the pydantic / NetworkX structures. The graph is the onecodeanalyzer-typescriptemits with--emit neo4j(schemaschema.neo4j.json); it is always populated out of band, and the SDK only polls it (read-only — never writes, needs no binary or project sources). TypeScriptAnalysis/CLDK.analysis(language="typescript")now accept an optionalneo4j_config(Neo4jConnectionConfig) to select the Neo4j backend; without it the in-memory backend is used, unchanged.- Read-only Neo4j-backed Python analysis backend (
cldk.analysis.python.neo4j.PyNeo4jBackend), the analog of the TypeScript one. It answers all 21PythonAnalysisBackendqueries via Cypher over the graphcodeanalyzer-python(>= 0.2.0) emits with--emit neo4j. Verified against a real 57-module project: every node/edge present in the graph reconstructs identically to the in-memoryPyCodeanalyzer(3169/3200 checks; zero weight/provenance mismatches on shared call edges). Known gaps are not in the query layer: projection-lossy fields (comments → docstring,PyVariableDeclaration.value/columns, per-binding import detail), and an upstream emitter bug where calls to a bare module name that is also imported (e.g.os/re/json) are dropped from the emitted call graph.PythonAnalysis/CLDK.analysis(language="python")accept the same optionalneo4j_config. - Read-only Neo4j-backed Java analysis backend (
cldk.analysis.java.neo4j.JNeo4jBackend), completing Neo4j parity across all three languages. It reconstructs the canonicalJApplicationfrom the graphcodeanalyzer-java(>= 2.4.0) emits with--emit neo4jand answers all 36JavaAnalysisBackendqueries with the in-memory backend's logic. Verified against the daytrader8 sample (145 classes): everything the graph actually contains reconstructs identically toJCodeanalyzer(97% of checks). Three projection gaps in thecodeanalyzer-java2.4.0 emitter (fields collapsing to one node, imports reduced to packages, a truncated call graph) are fixed in 2.4.1 (codeanalyzer-java#156/#157/#158, verified on daytrader —J_CALLSwent 287 → 1702), the version the SDK release now bundles.JavaAnalysis/CLDK.java(...)accept aNeo4jConnectionConfigas thebackend=config to select it. - Bumped
codeanalyzer-pythonto0.2.0(adds the Neo4j graph emitter); the bundledcodeanalyzer-javajar is now2.4.1(adds the Neo4j graph emitter + the field/import/call-graph projection fixes). The Java analyzer jar is no longer a pip dependency — the SDK release workflow downloads the latestcodeanalyzer-javajar into the bundledjar/directory. - Optional
neo4jextra (pip install cldk[neo4j]) for the Neo4j Python driver.
- Bundled JDK download for the Java backend.
ensure_jdkresolved the Temurin JVM via the Adoptium/assets/version/{release}endpoint, which now returns 404 for pinned releases (e.g.jdk-21.0.5+11) — so the first Java analysis on a clean machine failed before it started. It now resolves via the/binary/version/...endpoint (following the redirect to the GitHub asset) and reads the checksum from the asset's.sha256.txt.
- Doctest-style Examples across the public API surface of JavaAnalysis, PythonAnalysis, CAnalysis, and core CLDK helpers. Coverage includes Java CRUD operations and comment/docstring query APIs, plus concise inline examples for Python and C where applicable.
- Examples documenting expected NotImplementedError behavior for placeholder APIs (PythonAnalysis and CAnalysis) using doctest flags.
- Converted and standardized docstrings to strict Google style (Args, Returns, Raises, Examples) across edited modules.
- Standardized Examples to use the CLDK facade (e.g.,
CLDK(language="java").analysis(...)) instead of raw constructor calls. - Normalized all doctest Example inputs to single-line strings to ensure reliable mkdocstrings rendering.
- Clarified
CLDK.analysisreturn type with a precise union:JavaAnalysis | PythonAnalysis | CAnalysis. - Updated codeanalyzer version to v2.3.6.
- Fixed README.md logo display on PyPI by updating image URLs to use raw GitHub URLs and maintaining theme-based auto-switching with proper fallback
- mkdocstrings rendering issues caused by multi-line doctest strings and formatting inconsistencies.
- Replaced confusing examples like
JavaAnalysis(None, None, ...)with clear CLDK-based initialization patterns. - Packaging: ensured the built wheel includes the
cldkpackage by addingpackages = [{ include = "cldk" }]to Poetry configuration. - Fixed #141
- Multi-line doctest strings in Examples that broke mkdocstrings rendering; all examples are now single-line.
- Removed pandas dependency (#145)
- Added
argument_exprfield to JCallSite model for capturing actual parameter expressions in method calls - Added Star History section to README.md for tracking project popularity
- Updated codeanalyzer jar to version 2.3.5 with support for call argument expressions and fully qualified parameter types
- Modified codeanalyzer.py to preserve fully qualified parameter types in method signatures instead of simplifying them
- Updated method signature format to use fully qualified type names (e.g.,
java.lang.Stringinstead ofString) - Updated test fixtures with new analysis.json data reflecting the signature format changes
- Fixed method signature handling to maintain fully qualified parameter types for better type resolution
- Updated test cases to use fully qualified method signatures for improved accuracy
v1.0.5 - 2025-06-24
- Fixed issue #135
- Analysis level compatibility checking for analysis.json with passed analysis level
- Updated treesitter analysis to use global declarations of parser and language
v1.0.4 - 2025-06-11
- Added missing callable fields field validator
- Updated test fixture setup to use codeanalyzer jar from cldk/analysis/java/codeanalyzer/jar instead of test resources directory
- Updated analysis.json fixtures (daytrader8 and plantsbywebsphere)
- Removed dangling codeanalyzer jars from test resources
- Removed obsolete analysis.json fixture
v1.0.3 - 2025-06-01
- Added code start line attribute to JCallable (corresponding to added attribute in the java code analyzer model)
v1.0.2 - 2025-05-24
- Added test case and fixture for source analysis
- Added missing attributes in compilation unit model
- Fixed handling of
source_codeoption in Java codeanalyzer - Updated core.py to match python analysis signature
v1.0.1 - 2025-05-07
- Updated treesitter analysis to use global declarations of parser and language
v1.0.0 - 2025-04-29
- First stable release
- Updated contributing guidelines
- Updated README.md
- Updated codeanalyzer jar
- Updated java version in release automation
v0.5.1 - 2025-03-13
- Updated Java model to comply with codeanalyzer v2.3.1
- Updated codeanalyzer jar to the latest from codeanalyzer-java
- Updated get_all_docstrings to return dict
v0.5.0 - 2025-02-21
- Added release automation github actions
- Added Java 11 support in github actions
- Added release_config.json
- Added Comment parsing APIs at file, class, method, and docstring level
- Added support for parsing callable parameters and their location information
- Added Dev container instructions with Python, Java, C, and Rust support
- Added C/C++ analysis support
- Added CRUD operations support for Java JPA applications
- Consolidated analysis_level enums in init.py
- Updated codeanalyzer jar to the latest version
- Changed coverage minimum to 70%
- Updated documentation with mkdocs
- Updated badges and logos in README
- Added Discord community support
- Removed CodeQL dependency and refactored treesitter
- Removed ABCs from analysis
- Removed logic to find LLVM in linux OSes (only appears in Darwin)
- Removed redundant is_entry_point fields from JCallable and JType
- Removed unused parameters and code cleanup
- Fixed various test cases and compatibility issues
- Fixed treesitter superclass identification issues
- Fixed entry point detection code
- Fixed recursive error issues
v0.4.0 - 2024-11-13
- Fixed issue 67 - symbol table is none
- Updated poetry build rules to include codeanalyzer-*.jar
- Added test case to verify jar file exists
v0.3.0 - 2024-11-12
- Support for reading slim JSON from codeanalyzer v1.1.0
- Added more test tools (pylint, flake8, black, pspec, coverage)
- Added test coverage reporting
- Updated README.md to include the arXiv paper
- Removed obsolete test cases for unsupported languages
v0.2.0 - 2024-10-11
- Added GitHub Action to publish manual releases
- Added PyPi badge to README.md
v0.1.4 - 2024-10-21
- Fixed codeanalyzer.jar not being a PosixPath
v0.1.3 - 2024-10-21
- Fixed calling the correct codeanalyzer jar on version 0.1.3
- Removed auto-download of codeanalyzer jar
v0.1.2 - 2024-10-17
- Fixed tree-sitter bug
- Defined self.captures explicitly
0.1.0-dev - 2024-10-07
- Initial development version
- Set version to über json support
- Support for slim JSONs from codeanalyzer
- IBM Copyright added to all source files
- Added code parsing support
- Added support for symbol table call graph
- Added notebook examples for code summarization and test generation
- Basic CLDK framework implementation
- Updated dependencies in pyproject.toml
- Added metadata for PyPi distribution
- Updated README with installation instructions
- Fixed caller method implementation
- Fixed incremental analysis support
- Fixed download jar issues