Skip to content

Commit 190a448

Browse files
EightRiceclaude
andcommitted
Add TraceEncoder module for agent trace → VL-JEPA embedding conversion
Implements the trace encoding pipeline described in AGENT_TRACE_TRAINING.md: converts structured ATN agent traces (JSON) into embedding sequences suitable for VL-JEPA next-turn prediction training. New file: nodes/common/trace_encoder.py Architecture: - TraceEncoderConfig: configuration dataclass (embed_dim=384 matches VLJEPAConfig) - _SequenceEncoder: shared byte-level transformer backbone with mean-pooling - TextEncoder: encodes turn.content to (embed_dim,) via _SequenceEncoder - ActionEncoder: serialises tool calls to text, encodes to (embed_dim,) - ResultEncoder: serialises tool results to text, encodes to (embed_dim,) - TurnFuser: self-attention over (text, action, result) modality slots → single vector - OutcomeEncoder: structured (success, task_completed) + error text → (embed_dim,) - TraceEncoder: orchestrates encode_trace() → {embeddings, turn_mask, outcome_embedding} - TraceDataset: torch Dataset with quality filtering, deduplication, JSONL/directory loaders Quality filtering: - Skip traces with fewer than min_turns (default: 2) - Skip errored sessions unless include_errored=True - Optional embedding-based deduplication via cosine similarity threshold Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent f03dc20 commit 190a448

1 file changed

Lines changed: 885 additions & 0 deletions

File tree

0 commit comments

Comments
 (0)