Commit 0c2a1d4
Add TwoSpeedEngine: fast/slow inference routing with embedding cache
Implements the two-speed inference pattern from TWO_SPEED_ARCHITECTURE.md
for Phase B (single-node mode):
- TwoSpeedConfig: tunable thresholds for routing, caching, and decoding
- ComplexityEstimator: scores queries via length + embedding-space novelty
- EmbeddingCache: LRU cache of latent plans (OrderedDict, cosine-sim lookup)
- FastPath: decodes directly from the nearest cached latent plan
- SlowPath: rolls out VLJEPA dynamics for deep reasoning then decodes
- NetworkStub: Phase B shim that routes slow-path calls to local rollout();
interface designed for drop-in replacement with a real network client in Phase C
- TwoSpeedEngine: orchestrates routing, fallback (fast→slow on low confidence),
cache population, and decoding; exposes both async infer() and sync infer_sync()
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>1 parent c7eaf29 commit 0c2a1d4
1 file changed
Lines changed: 841 additions & 0 deletions
0 commit comments