Skip to content

Commit 0c2a1d4

Browse files
EightRiceclaude
andcommitted
Add TwoSpeedEngine: fast/slow inference routing with embedding cache
Implements the two-speed inference pattern from TWO_SPEED_ARCHITECTURE.md for Phase B (single-node mode): - TwoSpeedConfig: tunable thresholds for routing, caching, and decoding - ComplexityEstimator: scores queries via length + embedding-space novelty - EmbeddingCache: LRU cache of latent plans (OrderedDict, cosine-sim lookup) - FastPath: decodes directly from the nearest cached latent plan - SlowPath: rolls out VLJEPA dynamics for deep reasoning then decodes - NetworkStub: Phase B shim that routes slow-path calls to local rollout(); interface designed for drop-in replacement with a real network client in Phase C - TwoSpeedEngine: orchestrates routing, fallback (fast→slow on low confidence), cache population, and decoding; exposes both async infer() and sync infer_sync() Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent c7eaf29 commit 0c2a1d4

1 file changed

Lines changed: 841 additions & 0 deletions

File tree

0 commit comments

Comments
 (0)