feat: NVIDIA Cosmos integration — Reason verbalizer (path A) + C4 foundation#1
feat: NVIDIA Cosmos integration — Reason verbalizer (path A) + C4 foundation#1wikieden wants to merge 13 commits into
Conversation
Add CosmosReasonVerbalizer (spatialmem.cosmos) — wraps an OpenAI-compatible NVIDIA Cosmos Reason NIM (hosted build.nvidia.com or self-hosted) as an answer() backend. stdlib-urllib only, core stays numpy-only; configurable model=/base_url=, NVIDIA_API_KEY, injectable transport for offline tests. strip_reasoning() handles Cosmos <think>/<answer> tags incl. truncated and nested cases; HTTP/JSON failures surface as QueryError. - export + __all__ wiring in the facade - 11 unit tests (payload shape, tag stripping edge cases, error paths, end-to-end via SpatialMemory.answer()) - examples/04_cosmos_answer.py (offline-runnable) - README + spec/API.md reference Path A of the Cosmos integration: Cosmos = reasoning brain, SpatialMem = persistent 3D memory. Reviewed (code-reviewer); HIGH/MED/LOW findings fixed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
NVIDIA's June-2026 Cosmos 3 technical report names "temporally persistent state, spatial grounding tied to objects … a maintained, actionable scene estimate" as the open Physical-AI problem, while shipping no persistent scene store (bounded 74K context). The biggest Physical-AI player defines our thesis as unsolved — record it in VISION (Bet) and POSITIONING (NVIDIA row). Add docs/design/cosmos3-perception-adapter.md (path B): Cosmos 3 Reasoner emits structured camera-frame 3D boxes (JSON) + metric ego-pose, so a Cosmos3PerceptionAdapter can feed world-frame Detections (+ CLIP-crop feature). Notes the Encoder.encode_image gap, coordinate unknowns, offline test plan. Linked from DEV-PLAN Phase C as C4. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…dependent) Build the GPU-free, schema-independent parts of the Cosmos3PerceptionAdapter (path B) so only the Cosmos call + JSON parse remain weight-gated: - spatialmem.geometry: transform_points / oriented_box_corners / world_aabb_from_obb — lift a camera-frame oriented 3D box into the world-frame AABB a Detection carries. Pure numpy, 10 unit tests (identity/translation/ yaw-rotation/AABB growth). - spatialmem.ImageEncoder protocol + OpenClipEncoder.encode_image — per-object feature path (Cosmos emits no embedding); kept in the same space as encode_text so semantic query still aligns. - design doc updated: marks these built, Cosmos call/parser still weight-gated. 118 tests green, ruff clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 82b24473bb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| present. Falls back to the de-tagged text so the result is never empty when | ||
| the model did produce content. | ||
| """ | ||
| cleaned = _OPEN_THINK_RE.sub("", _THINK_RE.sub("", text)) |
There was a problem hiding this comment.
Handle nested
<think> blocks before parsing answers
When Cosmos emits or echoes a nested <think> block, this regex removes only through the first </think>, so content that is still inside the outer reasoning block remains in cleaned; if that leftover contains an <answer> tag before the real final answer, strip_reasoning() returns the reasoning's answer instead of the final one (e.g. <think>...<think>...</think><answer>wrong</answer></think><answer>right</answer> returns wrong). This contradicts the intended reasoning stripping and can make answer() surface hidden/incorrect reasoning text for nested-tag outputs.
Useful? React with 👍 / 👎.
Add docs/design/cosmos3-spatialmem-llm-brain.md — full system design for composing Cosmos 3 (perceive), SpatialMem (remember), and an LLM (reason) into one embodied cognitive loop. Covers the 4 interface contracts (C1 perception adapter, C2 serialize, C3 memory-as-tools with JSON schemas, C4 bounded active perception), two deployment topologies (Cosmos-as-brain vs BYO-LLM, B recommended), the cognitive tick, ablation, and a worked example. Hardened by an adversarial critic panel (architecture / API-accuracy / completeness); corrections folded: - API aligned to shipped code (changes() positional; serialize(format=, max_tokens=); verbalizer defaults to Cosmos Reason 2, not Cosmos 3) - Cosmos 3 facts corrected (RGB-only at inference — depth/pose used by the adapter, not the Reasoner; 74K is the generator context) - honest concurrency contract (v0 single-threaded sqlite, callers serialize; async queue is future) - new sections: Conventions, Security (prompt injection / egress / keys), store compatibility across sessions, error/degradation, episodic, cost model - v0 scoped single-camera/single-agent; multi-agent primitives listed as future Linked from DEV-PLAN Phase C. (A real code bug the review surfaced — maintenance commit() flushing un-fused pending observations — is tracked separately.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Expose SpatialMemory as framework-agnostic LLM tools (contract C3). Hand
SpatialMemTools(mem).schemas() to a function-calling model; route its tool calls
to .call(name, args). Returns JSON envelopes whose hits carry node_id so the LLM
can cite what it used.
Tools (map to shipped methods): semantic_search, spatial_query, whats_in,
whats_on, recent_changes, serialize_scene. New ToolError.
Hardened per adversarial review (correctness / security / consistency panel):
- all numeric args validated/bounded -> ToolError, never a raw exception
(k in [1,1000], radius_m > 0 finite, max_tokens in [1,1e5], near finite);
bool rejected where int/number expected
- call() funnels any unexpected downstream error to ToolError (no interpreter
internals leak to the model)
- getattr("_t_"+name) dispatch confirmed sandboxed; whats_in SQL parameterized
- labels control-char-stripped + length-capped; module SECURITY note that
scene/label text is untrusted (prompt-injection) and must be delimited
- whats_on echoes resolved anchor via meta; max_tokens=0 no longer means "no
budget"; _hit adds retrieval score
19 unit tests (schema shape, each dispatch, envelope fields, malformed-arg
matrix, unknown tool). examples/05_memory_tools.py (offline). 141 tests green,
ruff clean. Design doc C3 marked shipped.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Scope decision: SpatialMem core does NOT do perception. It ships only the BYO seam (PerceptionAdapter protocol) and stays numpy-only. All concrete perception moves to a separate companion repo, spatialmem-perception. Removed from core (relocating to spatialmem-perception): - src/spatialmem/geometry.py + tests (camera->world OBB lift) - ImageEncoder protocol + OpenClipEncoder.encode_image (image-crop encoding); OpenClipEncoder reverts to text-only (query encoder, the memory-side use) - docs/design/cosmos3-perception-adapter.md (adapter design) Kept in core: PerceptionAdapter protocol (the seam), Encoder/OpenClipEncoder (text, for query), CosmosReasonVerbalizer (LLM answer brain, not perception), SpatialMemTools (C3), the brain design doc. Docs: DEV-PLAN Phase C reframed as the companion-repo backlog; brain doc points perception pieces at spatialmem-perception. 132 tests green, ruff clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…panion) Follow-up to the perception split (b838f55, which only carried the file deletions). Apply the code/doc edits: - encoders.py: remove ImageEncoder protocol + OpenClipEncoder.encode_image; OpenClipEncoder reverts to text-only (query encoder) - __init__.py: drop ImageEncoder export + import - DEV-PLAN.md: Phase C reframed as companion repo spatialmem-perception backlog - brain design doc: perception pieces point at spatialmem-perception 132 tests green, ruff clean (core has no perception code left). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Consistent with the perception split: core ships only protocol seams + memory; concrete external-model integrations live in companion repos. Removed from core (relocated to spatialmem-brain): - src/spatialmem/cosmos.py (CosmosReasonVerbalizer) + tests + examples/04 - docs/design/cosmos3-spatialmem-llm-brain.md (the system design) Kept in core: the Verbalizer protocol + answer() seam (BYO LLM), and SpatialMemTools (C3 memory-as-tools — a memory API surface, not brain). Docs (README / spec/API / DEV-PLAN / tools / example05) point the brain pieces at the spatialmem-brain companion repo. 121 tests green, ruff clean (core has no LLM-integration code left). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Follow-up to a0b9d0f (which carried only the file deletions). Apply the edits: - __init__.py: drop CosmosReasonVerbalizer export + import - README / spec/API / DEV-PLAN: brain layer now in spatialmem-brain - tools.py / example05: design-doc reference points to spatialmem-brain Verbalizer protocol + answer() seam + SpatialMemTools stay in core. 121 tests green, ruff clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Guard against maintenance commits flushing un-fused observations. add_detections() stages observation rows in _pending (fusion deferred to commit()); a maintenance method's own conn.commit() between add and commit used to flush those rows to disk unfused, leaving orphan observations on a crash and a half-ingested store to any interleaved op. Covered by _flush_pending() guards on decay/consolidate/resplit/ forget/define_region/relate/update/close. Tests: - decay/consolidate between add and commit -> node count + history() linkage correct - crash-sim: second read-only connection sees 0 committed orphans on disk - close() without commit fuses pending, no orphans
- _opt_int: @overload-typed (int default -> int) and falls back to default on an explicit JSON null instead of leaking None downstream (e.g. tool arg "k": null now uses the default k) - store.insert_*: assert the always-present cursor.lastrowid (was int(None?)) - persist.connect: accept str | os.PathLike[str] (was str | Path) - encoders / vec: guarded reportMissingImports ignores on the optional [clip]/[vec] imports (open_clip, torch, sqlite_vec) - ruff format: tidy a pre-existing unformatted test - CHANGELOG: Fixed entry
Summary
Integrates NVIDIA Cosmos with SpatialMem. Cosmos = reasoning/perception brain; SpatialMem = the persistent 3D memory NVIDIA's own Cosmos 3 report names as the open problem (and ships without). Complementary, not competing.
Three commits:
feat: Cosmos Reason verbalizer (path A) —CosmosReasonVerbalizer(spatialmem.cosmos) wraps an OpenAI-compatible Cosmos Reason NIM as ananswer()backend. stdlib-urllib only (core stays numpy-only), configurablemodel=/base_url=,NVIDIA_API_KEY, injectable transport for offline tests.strip_reasoning()handles<think>/<answer>incl. truncated/nested. HTTP/JSON errors →QueryError. Reviewed (code-reviewer); HIGH/MED/LOW fixed.docs: Cosmos 3 endorsement + adapter design — VISION/POSITIONING cite the Cosmos 3 technical report defining our thesis as unsolved;docs/design/cosmos3-perception-adapter.mdspecs path B.feat: C4 foundation (CPU, schema-independent) —spatialmem.geometry(camera-frame OBB → world AABB) +ImageEncoderprotocol +OpenClipEncoder.encode_image. Only the Cosmos call + JSON parse remain weight-gated.Test plan
pytest), incl. 11 cosmos verbalizer + 10 geometryruff check+ruff format --checkclean (src + tests + examples)examples/04_cosmos_answer.pyruns offline (injected transport)NVIDIA_API_KEY) — not run🤖 Generated with Claude Code