Open-source replay/eval framework for LiveKit voice agents. One Docker image, one SQLite file, one Python SDK.
Alpha. The wire and SDK API can break between minor versions. Upgrading from a previous release wipes your data: delete
/data/xray.dbbefore starting the new container. Issues and feedback are the most useful contribution right now.
- Author a Conversation in Python — an ordered list of user-side turns, per-turn assertion predicates, and an optional per-replay LLM judge.
- Run it against your LiveKit voice agent. The SDK joins your room as a user-side participant, plays the user audio, captures the agent's audio + transcript.
- xray records the run as a Replay. The dev's agent emits OpenTelemetry spans during the run — xray's OTLP receiver routes them by
xray.replay.idand surfaces tool calls, model usage, and timings in the inspector. Spans of recognized vocabularies (xray.*, OTel GenAI semconvgen_ai.*, Langfuse) light up automatically. - Compare runs side-by-side. Pick 2–8 Replays of one Conversation to grid-compare; pick two Conversations to align by per-turn
keyand see what diverged.
The image is published to GHCR:
docker pull ghcr.io/xray-eval/xray:latestTagged releases are signed with cosign keyless (OIDC). To verify:
cosign verify ghcr.io/xray-eval/xray:<tag> \
--certificate-identity-regexp 'https://github.com/xray-eval/xray/' \
--certificate-oidc-issuer https://token.actions.githubusercontent.comOr build from source:
git clone https://github.com/xray-eval/xray.git
cd xray
docker build -t xray:local .The Python SDK:
pip install xray-py[livekit]Drop xray into your existing compose stack alongside your LiveKit agent:
# compose.yaml
services:
xray:
image: ghcr.io/xray-eval/xray:latest
ports:
- "127.0.0.1:8080:8080" # bind to localhost only — see Security below
volumes:
- xray-data:/data # SQLite + audio survive container restarts
my-voice-agent:
build: .
environment:
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT: http://xray:8080/v1/otlp/v1/traces
OTEL_EXPORTER_OTLP_PROTOCOL: http/json
depends_on:
- xray
volumes:
xray-data:docker compose up, then open http://localhost:8080. The API reference is at http://localhost:8080/docs.
import asyncio
import os
from xray import Assertion, Conversation, Judge, RunConfig, Turn, format_failures, run
from xray.runtime.livekit import LiveKitRuntime
async def main() -> None:
conv = Conversation(
name="booking-happy-path",
turns=[
Turn.user("Hi, I'd like to book a table for two at 7pm.", key="u0"),
Turn.agent(
key="a0",
assertions=(
Assertion.contains("confirmed"),
Assertion.tool_called("reserve_table"),
Assertion.max_latency_ms(2_000),
),
),
],
judges=(Judge.text_match("agent confirms a reservation for two", pass_score=80),),
)
runtime = LiveKitRuntime(
url=os.environ["LIVEKIT_URL"],
api_key=os.environ["LIVEKIT_API_KEY"],
api_secret=os.environ["LIVEKIT_API_SECRET"],
room="booking-test-room",
)
result = await run(
conversation=conv,
runtime=runtime,
xray_url="http://localhost:8080",
run_config=RunConfig(model="gpt-4o", temperature=0.5),
)
assert result.passed, format_failures(result)
asyncio.run(main())The dev's agent reads xray.replay.id (plus conversation.id / version / modality) from LiveKit room metadata and propagates them as OTEL baggage so every span — xray.*, gen_ai.*, Langfuse — gets routed to the right Replay. See docs/SDK.md.
- Replays of the same Conversation: select 2–8 from the Conversation detail page → grid view with per-column
run_configheaders. - Two Conversations: pick from the Conversations index → side-by-side aligned by per-turn
key. Unmatched turns render as labeled "no matching turn" placeholders.
One Bun process serves both the SPA and the API. One SQLite file at /data/xray.db on a mounted volume. No external database, no second container, no managed service. See .claude/rules/single-image-distribution.md.
┌─ xray-py SDK on dev's machine ───────────────────────────────────────┐
│ POST /v1/conversations (multipart spec; server hashes for hash) │
│ POST /v1/replays → returns replay_id │
│ LiveKitRuntime joins room, plays user audio, captures the mixdown │
│ POST /v1/replays/:id/audio + /analyze; SSE → evaluation_complete │
└─────────────────────────────────────────────────────────────────────┘
│
▼
┌─ dev's agent ─────────────────────────────────────────────────────────┐
│ reads replay.id from room metadata → OTEL baggage │
│ emits xray.* / gen_ai.* / langfuse spans │
└─────────────────────────────────────────────────────────────────────┘
│ OTLP/JSON
▼
┌────────────────────────┐
│ SQLite /data/xray.db │ single file, mounted volume
└────────────────────────┘
│
▼
┌────────────────────────┐
│ UI │ Conversations · Replays · Compare
└────────────────────────┘
- The SDK→xray surface has no auth. xray and your agent are expected to live in the same Docker network. Do not expose port 8080 publicly. The default compose snippet above binds to
127.0.0.1. - Server-side analyze chain calls OpenAI on every
/analyzerequest. Whisper runs once per VAD-detected turn for transcription; the judge LLM runs once per declaredJudge. xray ships no rate limit or per-replay cost ceiling — that's deliberate for a single-tenant local dev tool, but it means a wider bind (HOST=0.0.0.0, port-forwarded to the internet, etc.) without a fronting auth proxy lets anyone who can reach/v1/replays/:id/analyzedrain your OpenAI budget. If you must bind beyond loopback, put an auth proxy in front. - Secrets (LiveKit) live in the SDK's process.
OPENAI_API_KEYlives in xray's environment (used by both transcription and the judge). Pass it at runtime via composeenv_file:,environment:, ordocker run -e; never bake it into the image. - Secrets are runtime-only — pass them at run time (compose
environment:/env_file:, ordocker run -e), never baked into the image. - 7-day cooldown on npm releases, deny-by-default lifecycle scripts, every GitHub Action pinned to a 40-char SHA. See
.claude/rules/supply-chain.md. - Releases are signed with cosign keyless (OIDC) and carry build-provenance attestations.
docs/SDK.md— Python authoring + runtime + how to propagate baggage from LiveKit room metadata.docs/WIRE.md— OTLP attribute contract + recognized vocabularies and what fields are extracted from each./docson your running instance — generated OpenAPI 3.1 reference rendered by Scalar.
corepack enable # picks up the pinned pnpm
pnpm install # frozen-lockfile-safe; respects 7-day cooldown
pnpm dev # single Bun process via compose.dev.yaml (HMR for SPA + API)
pnpm docker:smoke # build image, run it, curl /healthz, kill — same check CI runsEvery CI step runs locally with one pnpm script. See CONTRIBUTING.md and CLAUDE.md.
Elastic License 2.0. Free to use, copy, modify, and self-host, including commercially inside your own organization. You may not offer xray to third parties as a hosted or managed service.