Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

Added

README Cited By section highlighting papers that cite CLDK (SAINT, ASTER, RECON, PRAXIS, Phaedrus, and others), compiled from Semantic Scholar / OpenAlex citation data.

[v1.2.0] - 2026-06-22

Added

Per-language factory methods on CLDK — CLDK.java(), CLDK.python(), CLDK.typescript(), and CLDK.c() — each with an honest signature exposing only the options that apply to that language. These are the preferred entry points, replacing the stringly-typed CLDK(language).analysis(...).
Typed backend-configuration objects in cldk.analysis.commons.backend_config. The backend is now selected by the type of the backend= config passed to a factory: CodeAnalyzerConfig (default; in-process analyzer) / PyCodeAnalyzerConfig (adds use_codeql, use_ray), or Neo4jConnectionConfig (read-only Neo4j). Neo4jConnectionConfig is hoisted here and re-exported from cldk.analysis.{python,typescript}.neo4j for backward compatibility.
Unified, language-keyed cache directory. All backends now share a single cache_dir (default <project>/.codeanalyzer) and write their artifacts under a per-language subdirectory (<cache_dir>/java, <cache_dir>/python, <cache_dir>/typescript), so a polyglot project analyzed under more than one language no longer overwrites a shared analysis.json.

Changed

Caching is on by default for Java/TypeScript. The in-process backend now caches analysis.json to disk (under the language-keyed cache_dir) instead of streaming over a stdout pipe.
CLDK(language).analysis(...) is deprecated and retained as a thin compatibility shim that forwards to the new factory methods (emits a DeprecationWarning).

Deprecated

Java source_code (single-file) input — pass project_path instead.

Removed

analysis_backend_path from the public interface. The backend binary ships with the packaged codeanalyzer-* dependency; for TypeScript, $CODEANALYZER_TS_BIN remains as the only out-of-band override.
analysis_json_path from the public interface — folded into the unified cache_dir.

Migration

The language-keyed cache relocates analysis.json from <cache_dir>/analysis.json to <cache_dir>/<language>/analysis.json; existing caches are not found at the new path, so the first run after upgrading recomputes the analysis.

Added (Neo4j)

Read-only Neo4j-backed TypeScript analysis backend (cldk.analysis.typescript.neo4j.TSNeo4jBackend). It is a drop-in alternative to the in-memory TSCodeanalyzer: it answers the same get_* query surface (call graph, callers/callees, class hierarchy, call sites, decorators, symbol lookups, ...) by running Cypher over a live Neo4j graph instead of walking the pydantic / NetworkX structures. The graph is the one codeanalyzer-typescript emits with --emit neo4j (schema schema.neo4j.json); it is always populated out of band, and the SDK only polls it (read-only — never writes, needs no binary or project sources).
TypeScriptAnalysis / CLDK.analysis(language="typescript") now accept an optional neo4j_config (Neo4jConnectionConfig) to select the Neo4j backend; without it the in-memory backend is used, unchanged.
Read-only Neo4j-backed Python analysis backend (cldk.analysis.python.neo4j.PyNeo4jBackend), the analog of the TypeScript one. It answers all 21 PythonAnalysisBackend queries via Cypher over the graph codeanalyzer-python (>= 0.2.0) emits with --emit neo4j. Verified against a real 57-module project: every node/edge present in the graph reconstructs identically to the in-memory PyCodeanalyzer (3169/3200 checks; zero weight/provenance mismatches on shared call edges). Known gaps are not in the query layer: projection-lossy fields (comments → docstring, PyVariableDeclaration.value/columns, per-binding import detail), and an upstream emitter bug where calls to a bare module name that is also imported (e.g. os/re/json) are dropped from the emitted call graph. PythonAnalysis / CLDK.analysis(language="python") accept the same optional neo4j_config.
Read-only Neo4j-backed Java analysis backend (cldk.analysis.java.neo4j.JNeo4jBackend), completing Neo4j parity across all three languages. It reconstructs the canonical JApplication from the graph codeanalyzer-java (>= 2.4.0) emits with --emit neo4j and answers all 36 JavaAnalysisBackend queries with the in-memory backend's logic. Verified against the daytrader8 sample (145 classes): everything the graph actually contains reconstructs identically to JCodeanalyzer (97% of checks). Three projection gaps in the codeanalyzer-java 2.4.0 emitter (fields collapsing to one node, imports reduced to packages, a truncated call graph) are fixed in 2.4.1 (codeanalyzer-java#156/#157/#158, verified on daytrader — J_CALLS went 287 → 1702), the version the SDK release now bundles. JavaAnalysis / CLDK.java(...) accept a Neo4jConnectionConfig as the backend= config to select it.
Bumped codeanalyzer-python to 0.2.0 (adds the Neo4j graph emitter); the bundled codeanalyzer-java jar is now 2.4.1 (adds the Neo4j graph emitter + the field/import/call-graph projection fixes). The Java analyzer jar is no longer a pip dependency — the SDK release workflow downloads the latest codeanalyzer-java jar into the bundled jar/ directory.
Optional neo4j extra (pip install cldk[neo4j]) for the Neo4j Python driver.

Fixed

Bundled JDK download for the Java backend. ensure_jdk resolved the Temurin JVM via the Adoptium /assets/version/{release} endpoint, which now returns 404 for pinned releases (e.g. jdk-21.0.5+11) — so the first Java analysis on a clean machine failed before it started. It now resolves via the /binary/version/... endpoint (following the redirect to the GitHub asset) and reads the checksum from the asset's .sha256.txt.

[v1.0.7] - 2026-02-14

Added

Doctest-style Examples across the public API surface of JavaAnalysis, PythonAnalysis, CAnalysis, and core CLDK helpers. Coverage includes Java CRUD operations and comment/docstring query APIs, plus concise inline examples for Python and C where applicable.
Examples documenting expected NotImplementedError behavior for placeholder APIs (PythonAnalysis and CAnalysis) using doctest flags.

Changed

Converted and standardized docstrings to strict Google style (Args, Returns, Raises, Examples) across edited modules.
Standardized Examples to use the CLDK facade (e.g., CLDK(language="java").analysis(...)) instead of raw constructor calls.
Normalized all doctest Example inputs to single-line strings to ensure reliable mkdocstrings rendering.
Clarified CLDK.analysis return type with a precise union: JavaAnalysis | PythonAnalysis | CAnalysis.
Updated codeanalyzer version to v2.3.6.

Fixed

Fixed README.md logo display on PyPI by updating image URLs to use raw GitHub URLs and maintaining theme-based auto-switching with proper fallback
mkdocstrings rendering issues caused by multi-line doctest strings and formatting inconsistencies.
Replaced confusing examples like JavaAnalysis(None, None, ...) with clear CLDK-based initialization patterns.
Packaging: ensured the built wheel includes the cldk package by adding packages = [{ include = "cldk" }] to Poetry configuration.
Fixed #141

Removed

Multi-line doctest strings in Examples that broke mkdocstrings rendering; all examples are now single-line.
Removed pandas dependency (#145)

[v1.0.6] - 2025-07-23

Added

Added argument_expr field to JCallSite model for capturing actual parameter expressions in method calls
Added Star History section to README.md for tracking project popularity

Changed

Updated codeanalyzer jar to version 2.3.5 with support for call argument expressions and fully qualified parameter types
Modified codeanalyzer.py to preserve fully qualified parameter types in method signatures instead of simplifying them
Updated method signature format to use fully qualified type names (e.g., java.lang.String instead of String)
Updated test fixtures with new analysis.json data reflecting the signature format changes

Fixed

Fixed method signature handling to maintain fully qualified parameter types for better type resolution
Updated test cases to use fully qualified method signatures for improved accuracy

v1.0.5 - 2025-06-24

Fixed

Fixed issue #135
Analysis level compatibility checking for analysis.json with passed analysis level

Changed

Updated treesitter analysis to use global declarations of parser and language

v1.0.4 - 2025-06-11

Added

Added missing callable fields field validator

Changed

Updated test fixture setup to use codeanalyzer jar from cldk/analysis/java/codeanalyzer/jar instead of test resources directory
Updated analysis.json fixtures (daytrader8 and plantsbywebsphere)

Removed

Removed dangling codeanalyzer jars from test resources
Removed obsolete analysis.json fixture

v1.0.3 - 2025-06-01

Added

Added code start line attribute to JCallable (corresponding to added attribute in the java code analyzer model)

v1.0.2 - 2025-05-24

Added

Added test case and fixture for source analysis
Added missing attributes in compilation unit model

Fixed

Fixed handling of source_code option in Java codeanalyzer
Updated core.py to match python analysis signature

v1.0.1 - 2025-05-07

Changed

Updated treesitter analysis to use global declarations of parser and language

v1.0.0 - 2025-04-29

Added

First stable release
Updated contributing guidelines

Changed

Updated README.md
Updated codeanalyzer jar
Updated java version in release automation

v0.5.1 - 2025-03-13

Changed

Updated Java model to comply with codeanalyzer v2.3.1
Updated codeanalyzer jar to the latest from codeanalyzer-java
Updated get_all_docstrings to return dict

v0.5.0 - 2025-02-21

Added

Added release automation github actions
Added Java 11 support in github actions
Added release_config.json
Added Comment parsing APIs at file, class, method, and docstring level
Added support for parsing callable parameters and their location information
Added Dev container instructions with Python, Java, C, and Rust support
Added C/C++ analysis support
Added CRUD operations support for Java JPA applications

Changed

Consolidated analysis_level enums in init.py
Updated codeanalyzer jar to the latest version
Changed coverage minimum to 70%
Updated documentation with mkdocs
Updated badges and logos in README
Added Discord community support

Removed

Removed CodeQL dependency and refactored treesitter
Removed ABCs from analysis
Removed logic to find LLVM in linux OSes (only appears in Darwin)
Removed redundant is_entry_point fields from JCallable and JType
Removed unused parameters and code cleanup

Fixed

Fixed various test cases and compatibility issues
Fixed treesitter superclass identification issues
Fixed entry point detection code
Fixed recursive error issues

v0.4.0 - 2024-11-13

Fixed

Fixed issue 67 - symbol table is none

Changed

Updated poetry build rules to include codeanalyzer-*.jar
Added test case to verify jar file exists

v0.3.0 - 2024-11-12

Added

Support for reading slim JSON from codeanalyzer v1.1.0
Added more test tools (pylint, flake8, black, pspec, coverage)
Added test coverage reporting

Changed

Updated README.md to include the arXiv paper
Removed obsolete test cases for unsupported languages

v0.2.0 - 2024-10-11

Added

Added GitHub Action to publish manual releases
Added PyPi badge to README.md

v0.1.4 - 2024-10-21

Fixed

Fixed codeanalyzer.jar not being a PosixPath

v0.1.3 - 2024-10-21

Fixed

Fixed calling the correct codeanalyzer jar on version 0.1.3
Removed auto-download of codeanalyzer jar

v0.1.2 - 2024-10-17

Fixed

Fixed tree-sitter bug
Defined self.captures explicitly

0.1.0-dev - 2024-10-07

Added

Initial development version
Set version to über json support
Support for slim JSONs from codeanalyzer
IBM Copyright added to all source files
Added code parsing support
Added support for symbol table call graph
Added notebook examples for code summarization and test generation
Basic CLDK framework implementation

Uh oh!

FilesExpand file tree

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Changelog

[Unreleased]

Added

[v1.2.0] - 2026-06-22

Added

Changed

Deprecated

Removed

Migration

Added (Neo4j)

Fixed

[v1.0.7] - 2026-02-14

Added

Changed

Fixed

Removed

[v1.0.6] - 2025-07-23

Added

Changed

Fixed

v1.0.5 - 2025-06-24

Fixed

Changed

v1.0.4 - 2025-06-11

Added

Changed

Removed

v1.0.3 - 2025-06-01

Added

v1.0.2 - 2025-05-24

Added

Fixed

v1.0.1 - 2025-05-07

Changed

v1.0.0 - 2025-04-29

Added

Changed

v0.5.1 - 2025-03-13

Changed

v0.5.0 - 2025-02-21

Added

Changed

Removed

Fixed

v0.4.0 - 2024-11-13

Fixed

Changed

v0.3.0 - 2024-11-12

Added

Changed

v0.2.0 - 2024-10-11

Added

v0.1.4 - 2024-10-21

Fixed

v0.1.3 - 2024-10-21

Fixed

v0.1.2 - 2024-10-17

Fixed

0.1.0-dev - 2024-10-07

Added

Changed

Fixed

Release Links