AgentIR

A compiler infrastructure for agentic trajectories.

AgentIR turns heterogeneous traces from agent frameworks, coding assistants, GUI/browser agents, tool-use agents, evaluation sandboxes, and research datasets into a canonical intermediate representation that can be verified, transformed, analyzed, and lowered into training, evaluation, replay, and observability targets.

Core thesis: make agentic trajectories compilable -- the way LLVM made programs compilable and MLIR made machine-learning graphs compilable.

Architecture

Source Traces
  AgentTrove / Codex SWE-bench / Claude Code / OpenHands / Hermes /
  LangGraph / AutoGen / MCP logs / custom JSONL / custom Parquet
       |
Frontends
  Python frontends or user-defined *.agentir.yaml DSL frontends
       |
RawIR  -->  ParsedIR  -->  Canonical AgentIR
       |
Pass Pipeline
  parse -> canonicalize -> pair -> redact -> slice -> verify -> ...
       |
Backends (lowering targets)
  SFT / RL / DPO / tool-use / observability / replay / framework-native

Key Features

Compiler-style architecture -- frontends, multi-level IR, pass manager, backends, and diagnostics, modeled after LLVM/MLIR
5 built-in format frontends -- AgentTrove, Codex SWE-bench Pro, Claude Code, OpenHands, Hermes Agent
User-extensible DSL -- define new trajectory formats declaratively with *.agentir.yaml files; no Python required
8 CLI subcommands -- dsl validate, probe, preview, convert, bench, diff, init, formats
174 tests with 100% pass rate and ~12K lines of Python
Battle-tested at scale -- 1.7M AgentTrove records (28M+ events) processed with 0 failures
Streaming JSONL, batched processing, error quarantine, and compiled selectors
Pass-based processing -- parse, canonicalize, pair, redact, slice, and verify are separate, composable passes
Compiler-style diagnostics -- diagnostic codes, severity levels, source references, and suggested fixes
Event-graph model -- events carry action, observation, artifact, state, control, and provenance
Loss-aware backend lowering -- every backend emits an explicit loss report detailing what was preserved, degraded, or dropped

Installation

git clone https://github.com/ravenSanstete/agentir.git
cd agentir

# using uv (recommended)
uv sync

# or standard pip
pip install -e .

Requirements: Python >= 3.11

Quickstart

The following 6 steps take you from raw heterogeneous traces to verified, canonical AgentIR in under 5 minutes.

# 1. Validate a DSL format specification
agentir dsl validate dsl/formats/agenttrove.agentir.yaml

# 2. Probe your data to see its structure
agentir dsl probe --input data/my_data.jsonl --dsl dsl/formats/agenttrove.agentir.yaml

# 3. Preview the first 3 converted records
agentir dsl preview --dsl dsl/formats/agenttrove.agentir.yaml \
  --input data/my_data.jsonl --limit 3

# 4. Convert to AgentIR format
agentir dsl convert --dsl dsl/formats/agenttrove.agentir.yaml \
  --input data/my_data.jsonl --output out.air.jsonl

# 5. Run the pass pipeline end-to-end
agentir compile --frontend agenttrove --input data/my_data.jsonl \
  --passes parse-sharegpt,canonicalize-tools,pair-tool-results,normalize-outcome,verify \
  --output out.canonical.air.jsonl

# 6. Benchmark throughput on 10K records
agentir dsl bench dsl/formats/agenttrove.agentir.yaml \
  --input data/my_data.jsonl --limit 10000

Core Concepts

Concept	Description
AgentIR Record	Top-level container: one row of source data plus its canonical AgentIR representation
Episode	A sequence of events that forms a complete agent interaction session
Event	The fundamental unit: an `action`, `observation`, `artifact`, `state`, `control`, or `outcome` step with provenance
Pass	A single, named transformation that operates on AgentIR records (parse, canonicalize, pair, verify, etc.)
Frontend	Parses one specific trajectory format and emits `RawIR` records
DSL	Declarative YAML-based format definition language (`*.agentir.yaml`) for user-defined frontends
Backend	Lowers canonical AgentIR into a target format (SFT training, RL replay, observability span, etc.)
IR Levels	`RawIR` (source-preserving) -> `ParsedIR` (structured) -> `Canonical AgentIR` (pass-applied, verified)

Project Structure

src/agentir/
  ir/              AgentIR schema models (event, action, observation, record)
  dsl/             DSL models, loader, compiler (YAML -> runtime frontend)
  frontends/       Base frontend + 5 built-in format frontends
  passes/          Pass base, registry, manager, and 12 pass implementations
  backends/        Training, evaluation, and observability backends
  cli/             Typer CLI main entry with subcommands
  io/              Streaming JSONL, Parquet, batched processing
  diagnostics/     Diagnostic model and reporter

dsl/formats/       5 built-in *.agentir.yaml DSL format specifications
tests/             174 tests (pytest)
examples/          End-to-end workflow examples
docs/              Architecture, SPEC, DSL, and developer documentation

Performance

Verified on the full 1.7M-record AgentTrove dataset (28M+ events):

Metric	Value
Records processed	1,711,738
Events processed	28,206,633
Throughput (records/sec)	1,811
Throughput (events/sec)	30,136
Failures	0

HuggingFace Datasets

Auto-converted trajectory datasets are available on HuggingFace under the WhitzardAgent organization as part of the AgentIR Collection:

Dataset	Format	Records
AgentTrove-AgentIR	AgentIR Canonical	50,000
AgentTrove-OpenAI	OpenAI Chat Messages	50,000
AgentTrove-Anthropic	Anthropic Tools API	50,000
AgentTrove-OpenHands	OpenHands Trajectory	50,000
AgentTrove-Hermes	Hermes XML	50,000
ClaudeCode-AgentIR	AgentIR Canonical	32,133
ClaudeCode-OpenAI	OpenAI Chat Messages	32,133
ClaudeCode-Anthropic	Anthropic Tools API	32,133
ClaudeCode-OpenHands	OpenHands Trajectory	32,133
ClaudeCode-Hermes	Hermes XML	32,133

from datasets import load_dataset

# Load in your preferred format
ds = load_dataset("WhitzardAgent/AgentTrove-OpenAI", split="train")

All datasets were auto-converted by AgentIR with 100% success rate (0 failures). AgentTrove: 50,000 records at 1,281 rec/sec. ClaudeCode: 32,133 records at 317 rec/sec (full dataset, 100% parse rate).

Contributing

Contributions are welcome. See CONTRIBUTING.md for guidelines on development setup, coding standards, testing requirements, and the pull-request process.

All contributions must pass the existing test suite (174 tests, 100% pass rate) and conform to ruff + mypy style rules.

License

This project is licensed under the Apache License 2.0. See LICENSE for the full text.

Acknowledgments

AgentIR draws inspiration from and builds upon:

LLVM / MLIR -- compiler infrastructure design, multi-level IR, and pass-manager architecture
Hugging Face Datasets -- data loading patterns and Parquet/Arrow ecosystem
AgentTrove (open-thoughts/AgentTrove) -- ShareGPT-style agent traces at scale
Codex SWE-bench Pro (Inferact/codex_swebenchpro_traces) -- coding-agent trajectories
Claude Code (nlile/misc-merged-claude-code-traces-v1) -- tool-use and multi-turn traces
OpenHands (nvidia/SWE-Hero-openhands-trajectories) -- structured trajectory format
Hermes Agent (lambda/hermes-agent-reasoning-traces) -- XML-based tool-call traces

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github		.github
docs		docs
dsl		dsl
examples		examples
hf_datasets		hf_datasets
prompts		prompts
samples		samples
schema		schema
scripts		scripts
src/agentir		src/agentir
tests		tests
.gitignore		.gitignore
APPLY_OVERWRITE.md		APPLY_OVERWRITE.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.md		MANIFEST.md
README.md		README.md
README_DSL_EXTENSION.md		README_DSL_EXTENSION.md
SECURITY.md		SECURITY.md
pyproject.template.toml		pyproject.template.toml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AgentIR

Architecture

Key Features

Installation

Quickstart

Core Concepts

Project Structure

Performance

HuggingFace Datasets

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AgentIR

Architecture

Key Features

Installation

Quickstart

Core Concepts

Project Structure

Performance

HuggingFace Datasets

Contributing

License

Acknowledgments

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages