feat: NVIDIA Cosmos integration — Reason verbalizer (path A) + C4 foundation by wikieden · Pull Request #1 · wikieden/spatialmem

wikieden · 2026-06-05T15:43:40Z

Summary

Integrates NVIDIA Cosmos with SpatialMem. Cosmos = reasoning/perception brain; SpatialMem = the persistent 3D memory NVIDIA's own Cosmos 3 report names as the open problem (and ships without). Complementary, not competing.

Three commits:

feat: Cosmos Reason verbalizer (path A) — CosmosReasonVerbalizer (spatialmem.cosmos) wraps an OpenAI-compatible Cosmos Reason NIM as an answer() backend. stdlib-urllib only (core stays numpy-only), configurable model=/base_url=, NVIDIA_API_KEY, injectable transport for offline tests. strip_reasoning() handles <think>/<answer> incl. truncated/nested. HTTP/JSON errors → QueryError. Reviewed (code-reviewer); HIGH/MED/LOW fixed.
docs: Cosmos 3 endorsement + adapter design — VISION/POSITIONING cite the Cosmos 3 technical report defining our thesis as unsolved; docs/design/cosmos3-perception-adapter.md specs path B.
feat: C4 foundation (CPU, schema-independent) — spatialmem.geometry (camera-frame OBB → world AABB) + ImageEncoder protocol + OpenClipEncoder.encode_image. Only the Cosmos call + JSON parse remain weight-gated.

Test plan

118 tests green (pytest), incl. 11 cosmos verbalizer + 10 geometry
ruff check + ruff format --check clean (src + tests + examples)
examples/04_cosmos_answer.py runs offline (injected transport)
core stays numpy-only (cosmos.py stdlib urllib; clip/torch lazy-imported)
real NIM smoke test (needs NVIDIA_API_KEY) — not run
path-B Cosmos call + JSON parser — weight-gated, deferred

🤖 Generated with Claude Code

Add CosmosReasonVerbalizer (spatialmem.cosmos) — wraps an OpenAI-compatible NVIDIA Cosmos Reason NIM (hosted build.nvidia.com or self-hosted) as an answer() backend. stdlib-urllib only, core stays numpy-only; configurable model=/base_url=, NVIDIA_API_KEY, injectable transport for offline tests. strip_reasoning() handles Cosmos <think>/<answer> tags incl. truncated and nested cases; HTTP/JSON failures surface as QueryError. - export + __all__ wiring in the facade - 11 unit tests (payload shape, tag stripping edge cases, error paths, end-to-end via SpatialMemory.answer()) - examples/04_cosmos_answer.py (offline-runnable) - README + spec/API.md reference Path A of the Cosmos integration: Cosmos = reasoning brain, SpatialMem = persistent 3D memory. Reviewed (code-reviewer); HIGH/MED/LOW findings fixed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

NVIDIA's June-2026 Cosmos 3 technical report names "temporally persistent state, spatial grounding tied to objects … a maintained, actionable scene estimate" as the open Physical-AI problem, while shipping no persistent scene store (bounded 74K context). The biggest Physical-AI player defines our thesis as unsolved — record it in VISION (Bet) and POSITIONING (NVIDIA row). Add docs/design/cosmos3-perception-adapter.md (path B): Cosmos 3 Reasoner emits structured camera-frame 3D boxes (JSON) + metric ego-pose, so a Cosmos3PerceptionAdapter can feed world-frame Detections (+ CLIP-crop feature). Notes the Encoder.encode_image gap, coordinate unknowns, offline test plan. Linked from DEV-PLAN Phase C as C4. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…dependent) Build the GPU-free, schema-independent parts of the Cosmos3PerceptionAdapter (path B) so only the Cosmos call + JSON parse remain weight-gated: - spatialmem.geometry: transform_points / oriented_box_corners / world_aabb_from_obb — lift a camera-frame oriented 3D box into the world-frame AABB a Detection carries. Pure numpy, 10 unit tests (identity/translation/ yaw-rotation/AABB growth). - spatialmem.ImageEncoder protocol + OpenClipEncoder.encode_image — per-object feature path (Cosmos emits no embedding); kept in the same space as encode_text so semantic query still aligns. - design doc updated: marks these built, Cosmos call/parser still weight-gated. 118 tests green, ruff clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 82b24473bb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-05T15:46:12Z

+    present. Falls back to the de-tagged text so the result is never empty when
+    the model did produce content.
+    """
+    cleaned = _OPEN_THINK_RE.sub("", _THINK_RE.sub("", text))


Handle nested <think> blocks before parsing answers

When Cosmos emits or echoes a nested <think> block, this regex removes only through the first </think>, so content that is still inside the outer reasoning block remains in cleaned; if that leftover contains an <answer> tag before the real final answer, strip_reasoning() returns the reasoning's answer instead of the final one (e.g. <think>...<think>...</think><answer>wrong</answer></think><answer>right</answer> returns wrong). This contradicts the intended reasoning stripping and can make answer() surface hidden/incorrect reasoning text for nested-tag outputs.

Useful? React with 👍 / 👎.

Add docs/design/cosmos3-spatialmem-llm-brain.md — full system design for composing Cosmos 3 (perceive), SpatialMem (remember), and an LLM (reason) into one embodied cognitive loop. Covers the 4 interface contracts (C1 perception adapter, C2 serialize, C3 memory-as-tools with JSON schemas, C4 bounded active perception), two deployment topologies (Cosmos-as-brain vs BYO-LLM, B recommended), the cognitive tick, ablation, and a worked example. Hardened by an adversarial critic panel (architecture / API-accuracy / completeness); corrections folded: - API aligned to shipped code (changes() positional; serialize(format=, max_tokens=); verbalizer defaults to Cosmos Reason 2, not Cosmos 3) - Cosmos 3 facts corrected (RGB-only at inference — depth/pose used by the adapter, not the Reasoner; 74K is the generator context) - honest concurrency contract (v0 single-threaded sqlite, callers serialize; async queue is future) - new sections: Conventions, Security (prompt injection / egress / keys), store compatibility across sessions, error/degradation, episodic, cost model - v0 scoped single-camera/single-agent; multi-agent primitives listed as future Linked from DEV-PLAN Phase C. (A real code bug the review surfaced — maintenance commit() flushing un-fused pending observations — is tracked separately.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Expose SpatialMemory as framework-agnostic LLM tools (contract C3). Hand SpatialMemTools(mem).schemas() to a function-calling model; route its tool calls to .call(name, args). Returns JSON envelopes whose hits carry node_id so the LLM can cite what it used. Tools (map to shipped methods): semantic_search, spatial_query, whats_in, whats_on, recent_changes, serialize_scene. New ToolError. Hardened per adversarial review (correctness / security / consistency panel): - all numeric args validated/bounded -> ToolError, never a raw exception (k in [1,1000], radius_m > 0 finite, max_tokens in [1,1e5], near finite); bool rejected where int/number expected - call() funnels any unexpected downstream error to ToolError (no interpreter internals leak to the model) - getattr("_t_"+name) dispatch confirmed sandboxed; whats_in SQL parameterized - labels control-char-stripped + length-capped; module SECURITY note that scene/label text is untrusted (prompt-injection) and must be delimited - whats_on echoes resolved anchor via meta; max_tokens=0 no longer means "no budget"; _hit adds retrieval score 19 unit tests (schema shape, each dispatch, envelope fields, malformed-arg matrix, unknown tool). examples/05_memory_tools.py (offline). 141 tests green, ruff clean. Design doc C3 marked shipped. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Scope decision: SpatialMem core does NOT do perception. It ships only the BYO seam (PerceptionAdapter protocol) and stays numpy-only. All concrete perception moves to a separate companion repo, spatialmem-perception. Removed from core (relocating to spatialmem-perception): - src/spatialmem/geometry.py + tests (camera->world OBB lift) - ImageEncoder protocol + OpenClipEncoder.encode_image (image-crop encoding); OpenClipEncoder reverts to text-only (query encoder, the memory-side use) - docs/design/cosmos3-perception-adapter.md (adapter design) Kept in core: PerceptionAdapter protocol (the seam), Encoder/OpenClipEncoder (text, for query), CosmosReasonVerbalizer (LLM answer brain, not perception), SpatialMemTools (C3), the brain design doc. Docs: DEV-PLAN Phase C reframed as the companion-repo backlog; brain doc points perception pieces at spatialmem-perception. 132 tests green, ruff clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…panion) Follow-up to the perception split (b838f55, which only carried the file deletions). Apply the code/doc edits: - encoders.py: remove ImageEncoder protocol + OpenClipEncoder.encode_image; OpenClipEncoder reverts to text-only (query encoder) - __init__.py: drop ImageEncoder export + import - DEV-PLAN.md: Phase C reframed as companion repo spatialmem-perception backlog - brain design doc: perception pieces point at spatialmem-perception 132 tests green, ruff clean (core has no perception code left). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Consistent with the perception split: core ships only protocol seams + memory; concrete external-model integrations live in companion repos. Removed from core (relocated to spatialmem-brain): - src/spatialmem/cosmos.py (CosmosReasonVerbalizer) + tests + examples/04 - docs/design/cosmos3-spatialmem-llm-brain.md (the system design) Kept in core: the Verbalizer protocol + answer() seam (BYO LLM), and SpatialMemTools (C3 memory-as-tools — a memory API surface, not brain). Docs (README / spec/API / DEV-PLAN / tools / example05) point the brain pieces at the spatialmem-brain companion repo. 121 tests green, ruff clean (core has no LLM-integration code left). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Follow-up to a0b9d0f (which carried only the file deletions). Apply the edits: - __init__.py: drop CosmosReasonVerbalizer export + import - README / spec/API / DEV-PLAN: brain layer now in spatialmem-brain - tools.py / example05: design-doc reference points to spatialmem-brain Verbalizer protocol + answer() seam + SpatialMemTools stay in core. 121 tests green, ruff clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Guard against maintenance commits flushing un-fused observations. add_detections() stages observation rows in _pending (fusion deferred to commit()); a maintenance method's own conn.commit() between add and commit used to flush those rows to disk unfused, leaving orphan observations on a crash and a half-ingested store to any interleaved op. Covered by _flush_pending() guards on decay/consolidate/resplit/ forget/define_region/relate/update/close. Tests: - decay/consolidate between add and commit -> node count + history() linkage correct - crash-sim: second read-only connection sees 0 committed orphans on disk - close() without commit fuses pending, no orphans

- _opt_int: @overload-typed (int default -> int) and falls back to default on an explicit JSON null instead of leaking None downstream (e.g. tool arg "k": null now uses the default k) - store.insert_*: assert the always-present cursor.lastrowid (was int(None?)) - persist.connect: accept str | os.PathLike[str] (was str | Path) - encoders / vec: guarded reportMissingImports ignores on the optional [clip]/[vec] imports (open_clip, torch, sqlite_vec) - ruff format: tidy a pre-existing unformatted test - CHANGELOG: Fixed entry

wikieden and others added 4 commits June 5, 2026 23:34

style: ruff format geometry.py (CI lint fix)

a039ed6

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

chatgpt-codex-connector Bot reviewed Jun 5, 2026

View reviewed changes

wikieden and others added 9 commits June 6, 2026 10:45

style: ruff format tools.py + test_tools.py (CI lint fix)

27b87b1

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: NVIDIA Cosmos integration — Reason verbalizer (path A) + C4 foundation#1

feat: NVIDIA Cosmos integration — Reason verbalizer (path A) + C4 foundation#1
wikieden wants to merge 13 commits into
mainfrom
feat/cosmos-integration

wikieden commented Jun 5, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wikieden commented Jun 5, 2026

Summary

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant