Commit 0c2a1d4

and

committed

Add TwoSpeedEngine: fast/slow inference routing with embedding cache

Implements the two-speed inference pattern from TWO_SPEED_ARCHITECTURE.md for Phase B (single-node mode): - TwoSpeedConfig: tunable thresholds for routing, caching, and decoding - ComplexityEstimator: scores queries via length + embedding-space novelty - EmbeddingCache: LRU cache of latent plans (OrderedDict, cosine-sim lookup) - FastPath: decodes directly from the nearest cached latent plan - SlowPath: rolls out VLJEPA dynamics for deep reasoning then decodes - NetworkStub: Phase B shim that routes slow-path calls to local rollout(); interface designed for drop-in replacement with a real network client in Phase C - TwoSpeedEngine: orchestrates routing, fallback (fast→slow on low confidence), cache population, and decoding; exposes both async infer() and sync infer_sync() Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

1 parent c7eaf29 commit 0c2a1d4Copy full SHA for 0c2a1d4

1 file changed

nodes/common
- two_speed.py

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit 0c2a1d4

File tree

0 commit comments