CodeAssure

AI-powered SAST finding verification. Takes SAST scanner results and a codebase, uses an LLM agent to verify each finding, and produces enriched results with verdicts, severity ratings, and visual explanations.

Quick Start

# Install
uv sync
uv pip install -e .

# Run
codeassure \
  --codebase ./my-project \
  --findings results.json \
  --output verified.json

# With benchmarking
codeassure \
  --codebase ./my-project \
  --findings results.json \
  --output verified.json \
  --verify ground_truth.json

How It Works

CodeAssure runs a three-stage agent pipeline:

Analyzer (Generator) — tool-using agent reads the flagged code, gathers context via read_file and grep_code, produces a structured analysis
Formatter — extracts a JSON verdict from the analysis, with a repair loop for malformed output
Evaluator — reviews the verdict for internal consistency, assigns severity, and can reject for retry

Related findings are grouped before analysis — co-located findings on the same code get analyzed together with coherence constraints, so verdicts don't contradict each other.

A finding policy config tells the model what counts as a true positive for each customer (security-only vs. detection-semantics).

CLI

codeassure --codebase DIR --findings FILE --output FILE [OPTIONS]

Option	Description
`--codebase DIR`	Root directory that finding paths are relative to
`--findings FILE`	SAST findings JSON (e.g., Semgrep results.json)
`--output, -o FILE`	Output path for verified findings
`--config, -c PATH`	Path to codeassure.json (default: `./codeassure.json`)
`--jobs, -j N`	Max concurrent LLM requests (overrides config)
`--no-grouping`	Disable finding grouping (analyze each finding independently)
`--verify FILE`	Compare output against ground-truth JSON and write a CSV report

Configuration

codeassure.json:

{
  "model": {
    "provider": "openai-compatible",
    "name": "your-model-name",
    "api_base": "http://localhost:5000",
    "api_key": "$YOUR_API_KEY_ENV_VAR",
    "tool_calling": true
  },
  "concurrency": 4,
  "stage_timeout": 120,
  "finding_timeout": 300,
  "finding_policy": {
    "best_practice_is_tp": true,
    "informational_detection_is_tp": true,
    "audit_rule_is_tp": true
  }
}

Model fields

Field	Required	Description
`provider`	yes	One of `openai`, `openai-compatible`, `anthropic`, `google`, `gemini`
`name`	yes	Model name as known by the provider
`api_base`	no	Root host URL — always provide without `/v1` (see table below)
`api_key`	no	API key literal or `$ENV_VAR` reference (e.g. `"$OPENAI_API_KEY"`)
`tool_calling`	no	`true` (default) — set to `false` for models that don't support tool/function calling

`api_base` per provider

Provider	You set `api_base`	Actual endpoint called
`openai` / `openai-compatible`	`http://localhost:5000`	`http://localhost:5000/v1/chat/completions`
`anthropic`	`https://your-proxy.example.com`	`https://your-proxy.example.com/v1/messages`
`google` / `gemini`	`https://your-proxy.example.com`	`https://your-proxy.example.com/v1beta/models/{model}:generateContent`

Other config fields

Field	Default	Description
`concurrency`	`4`	Max concurrent LLM requests
`stage_timeout`	`120`	Seconds per LLM stage (analyzer or formatter)
`finding_timeout`	`300`	Seconds for the entire finding (both stages + repair)
`request_limit`	`200`	Max requests per `agent.run()` call
`voting_rounds`	`1`	Run each finding N times and take majority verdict
`max_tokens`	`4096`	Max completion tokens per LLM call
`thinking_map`	`null`	Severity → thinking effort (`full`/`low`/`off`). null = disabled
`finding_policy`	all true	What counts as true_positive for this customer

Output

Each finding gets a verification block:

{
  "verification": {
    "verdict": "true_positive",
    "is_security_vulnerability": true,
    "severity": "high",
    "confidence": "high",
    "severity": "high",
    "reason": "subprocess.run called with dynamic user input and shell=True.",
    "evidence": [{"location": "app/utils.py:42"}],
    "graph": {
      "summary": "Taint flow: os.environ → subprocess.run",
      "mermaid": "graph TD\n    ...",
      "nodes": [...],
      "edges": [...]
    }
  }
}

Field	Values	Description
`verdict`	`true_positive`, `false_positive`, `uncertain`	Did the scanner correctly detect the pattern?
`is_security_vulnerability`	`true`, `false`	Is this exploitable? Assessed from code context, independent of verdict
`confidence`	`high`, `medium`, `low`	Confidence level
`severity`	`critical`, `high`, `medium`, `low`	Assessed severity for `true_positive`; always `low` for `false_positive`/`uncertain`

The output also includes a codebase_tree for visualization:

{
  "results": [...],
  "codebase_tree": [
    {"path": "src/app.py", "type": "file", "size": 1234},
    {"path": "src/utils", "type": "dir", "size": 0}
  ]
}

Visualization UI

cd ui
pnpm install
pnpm dev --port 3333

Open http://localhost:3333 and drop the output JSON. The UI renders a D3 force graph with findings overlaid, severity shading, and a detail panel per finding.

Checkpointing

If the run crashes mid-way, re-run the same command. CodeAssure saves progress to <output>.checkpoint.json every 5 findings and resumes from where it left off. The checkpoint is deleted on successful completion.

Benchmarking

codeassure \
  --codebase samples/sample-9/k8s_jobs \
  --findings samples/sample-9/results.json \
  --output samples/sample-9/output.json \
  --verify samples/sample-9/final_results.json

Prints a confusion matrix comparing the effective verdict against ground truth (is_false_positive field). The collapse rule: verdict=TP + is_security_vulnerability=false → effective FP.

Project Structure

sast_verify/
  cli.py                  # CLI entry point
  config.py               # Config model + FindingPolicy
  pipeline.py             # Orchestration, checkpointing, codebase tree walker
  preprocess.py           # Normalizes raw SAST JSON into Finding objects
  retrieval.py            # Anchors findings to source code evidence
  schema.py               # Pydantic models: Finding, Evidence, Verdict
  grouping.py             # Finding Relationship Graph: groups co-located findings
  graph.py                # Mermaid flow diagram generator per finding

  agents/
    analyzer.py           # Builds analyzer, formatter, evaluator agents
    runner.py             # Async runner: generator/evaluator pipeline, group analysis
    tools.py              # read_file, grep_code (sandboxed to codebase)
    deps.py               # AnalyzerDeps (tool access scope)

  prompts/
    __init__.py           # Message builders (single, group, evaluator)
    analyzer.py           # System prompts (analyzer, formatter, evaluator, group variants)
    rule_policies.py      # Deterministic verdict policies for known rule families

ui/                       # Next.js visualization app
  src/
    app/page.tsx          # Upload + force graph view
    components/
      ForceGraph.tsx      # D3 force graph with finding overlay
      FileUpload.tsx      # JSON file upload
    lib/
      types.ts            # TypeScript types matching output schema
      theme.ts            # AccuKnox brand colors
      graph-builder.ts    # Finding flow graph for detail view

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
brev_docker_files		brev_docker_files
docs		docs
sast_verify		sast_verify
tests		tests
ui		ui
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
build.sh		build.sh
build_entry.py		build_entry.py
codeassure.json		codeassure.json
codeassure.jsonc		codeassure.jsonc
codeassure.png		codeassure.png
codeassure.spec		codeassure.spec
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CodeAssure

Quick Start

How It Works

CLI

Configuration

Model fields

`api_base` per provider

Other config fields

Output

Visualization UI

Checkpointing

Benchmarking

Project Structure

About

Uh oh!

Releases 5

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CodeAssure

Quick Start

How It Works

CLI

Configuration

Model fields

api_base per provider

Other config fields

Output

Visualization UI

Checkpointing

Benchmarking

Project Structure

About

Resources

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`api_base` per provider

Packages