Skip to content

feat: NVIDIA Cosmos integration — Reason verbalizer (path A) + C4 foundation#1

Open
wikieden wants to merge 13 commits into
mainfrom
feat/cosmos-integration
Open

feat: NVIDIA Cosmos integration — Reason verbalizer (path A) + C4 foundation#1
wikieden wants to merge 13 commits into
mainfrom
feat/cosmos-integration

Conversation

@wikieden
Copy link
Copy Markdown
Owner

@wikieden wikieden commented Jun 5, 2026

Summary

Integrates NVIDIA Cosmos with SpatialMem. Cosmos = reasoning/perception brain; SpatialMem = the persistent 3D memory NVIDIA's own Cosmos 3 report names as the open problem (and ships without). Complementary, not competing.

Three commits:

  1. feat: Cosmos Reason verbalizer (path A)CosmosReasonVerbalizer (spatialmem.cosmos) wraps an OpenAI-compatible Cosmos Reason NIM as an answer() backend. stdlib-urllib only (core stays numpy-only), configurable model=/base_url=, NVIDIA_API_KEY, injectable transport for offline tests. strip_reasoning() handles <think>/<answer> incl. truncated/nested. HTTP/JSON errors → QueryError. Reviewed (code-reviewer); HIGH/MED/LOW fixed.
  2. docs: Cosmos 3 endorsement + adapter design — VISION/POSITIONING cite the Cosmos 3 technical report defining our thesis as unsolved; docs/design/cosmos3-perception-adapter.md specs path B.
  3. feat: C4 foundation (CPU, schema-independent)spatialmem.geometry (camera-frame OBB → world AABB) + ImageEncoder protocol + OpenClipEncoder.encode_image. Only the Cosmos call + JSON parse remain weight-gated.

Test plan

  • 118 tests green (pytest), incl. 11 cosmos verbalizer + 10 geometry
  • ruff check + ruff format --check clean (src + tests + examples)
  • examples/04_cosmos_answer.py runs offline (injected transport)
  • core stays numpy-only (cosmos.py stdlib urllib; clip/torch lazy-imported)
  • real NIM smoke test (needs NVIDIA_API_KEY) — not run
  • path-B Cosmos call + JSON parser — weight-gated, deferred

🤖 Generated with Claude Code

wikieden and others added 4 commits June 5, 2026 23:34
Add CosmosReasonVerbalizer (spatialmem.cosmos) — wraps an OpenAI-compatible
NVIDIA Cosmos Reason NIM (hosted build.nvidia.com or self-hosted) as an
answer() backend. stdlib-urllib only, core stays numpy-only; configurable
model=/base_url=, NVIDIA_API_KEY, injectable transport for offline tests.
strip_reasoning() handles Cosmos <think>/<answer> tags incl. truncated and
nested cases; HTTP/JSON failures surface as QueryError.

- export + __all__ wiring in the facade
- 11 unit tests (payload shape, tag stripping edge cases, error paths,
  end-to-end via SpatialMemory.answer())
- examples/04_cosmos_answer.py (offline-runnable)
- README + spec/API.md reference

Path A of the Cosmos integration: Cosmos = reasoning brain, SpatialMem =
persistent 3D memory. Reviewed (code-reviewer); HIGH/MED/LOW findings fixed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
NVIDIA's June-2026 Cosmos 3 technical report names "temporally persistent
state, spatial grounding tied to objects … a maintained, actionable scene
estimate" as the open Physical-AI problem, while shipping no persistent scene
store (bounded 74K context). The biggest Physical-AI player defines our thesis
as unsolved — record it in VISION (Bet) and POSITIONING (NVIDIA row).

Add docs/design/cosmos3-perception-adapter.md (path B): Cosmos 3 Reasoner
emits structured camera-frame 3D boxes (JSON) + metric ego-pose, so a
Cosmos3PerceptionAdapter can feed world-frame Detections (+ CLIP-crop feature).
Notes the Encoder.encode_image gap, coordinate unknowns, offline test plan.
Linked from DEV-PLAN Phase C as C4.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…dependent)

Build the GPU-free, schema-independent parts of the Cosmos3PerceptionAdapter
(path B) so only the Cosmos call + JSON parse remain weight-gated:

- spatialmem.geometry: transform_points / oriented_box_corners /
  world_aabb_from_obb — lift a camera-frame oriented 3D box into the world-frame
  AABB a Detection carries. Pure numpy, 10 unit tests (identity/translation/
  yaw-rotation/AABB growth).
- spatialmem.ImageEncoder protocol + OpenClipEncoder.encode_image — per-object
  feature path (Cosmos emits no embedding); kept in the same space as
  encode_text so semantic query still aligns.
- design doc updated: marks these built, Cosmos call/parser still weight-gated.

118 tests green, ruff clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 82b24473bb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/spatialmem/cosmos.py Outdated
present. Falls back to the de-tagged text so the result is never empty when
the model did produce content.
"""
cleaned = _OPEN_THINK_RE.sub("", _THINK_RE.sub("", text))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle nested <think> blocks before parsing answers

When Cosmos emits or echoes a nested <think> block, this regex removes only through the first </think>, so content that is still inside the outer reasoning block remains in cleaned; if that leftover contains an <answer> tag before the real final answer, strip_reasoning() returns the reasoning's answer instead of the final one (e.g. <think>...<think>...</think><answer>wrong</answer></think><answer>right</answer> returns wrong). This contradicts the intended reasoning stripping and can make answer() surface hidden/incorrect reasoning text for nested-tag outputs.

Useful? React with 👍 / 👎.

wikieden and others added 9 commits June 6, 2026 10:45
Add docs/design/cosmos3-spatialmem-llm-brain.md — full system design for
composing Cosmos 3 (perceive), SpatialMem (remember), and an LLM (reason) into
one embodied cognitive loop. Covers the 4 interface contracts (C1 perception
adapter, C2 serialize, C3 memory-as-tools with JSON schemas, C4 bounded active
perception), two deployment topologies (Cosmos-as-brain vs BYO-LLM, B
recommended), the cognitive tick, ablation, and a worked example.

Hardened by an adversarial critic panel (architecture / API-accuracy /
completeness); corrections folded:
- API aligned to shipped code (changes() positional; serialize(format=,
  max_tokens=); verbalizer defaults to Cosmos Reason 2, not Cosmos 3)
- Cosmos 3 facts corrected (RGB-only at inference — depth/pose used by the
  adapter, not the Reasoner; 74K is the generator context)
- honest concurrency contract (v0 single-threaded sqlite, callers serialize;
  async queue is future)
- new sections: Conventions, Security (prompt injection / egress / keys),
  store compatibility across sessions, error/degradation, episodic, cost model
- v0 scoped single-camera/single-agent; multi-agent primitives listed as future

Linked from DEV-PLAN Phase C. (A real code bug the review surfaced — maintenance
commit() flushing un-fused pending observations — is tracked separately.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Expose SpatialMemory as framework-agnostic LLM tools (contract C3). Hand
SpatialMemTools(mem).schemas() to a function-calling model; route its tool calls
to .call(name, args). Returns JSON envelopes whose hits carry node_id so the LLM
can cite what it used.

Tools (map to shipped methods): semantic_search, spatial_query, whats_in,
whats_on, recent_changes, serialize_scene. New ToolError.

Hardened per adversarial review (correctness / security / consistency panel):
- all numeric args validated/bounded -> ToolError, never a raw exception
  (k in [1,1000], radius_m > 0 finite, max_tokens in [1,1e5], near finite);
  bool rejected where int/number expected
- call() funnels any unexpected downstream error to ToolError (no interpreter
  internals leak to the model)
- getattr("_t_"+name) dispatch confirmed sandboxed; whats_in SQL parameterized
- labels control-char-stripped + length-capped; module SECURITY note that
  scene/label text is untrusted (prompt-injection) and must be delimited
- whats_on echoes resolved anchor via meta; max_tokens=0 no longer means "no
  budget"; _hit adds retrieval score

19 unit tests (schema shape, each dispatch, envelope fields, malformed-arg
matrix, unknown tool). examples/05_memory_tools.py (offline). 141 tests green,
ruff clean. Design doc C3 marked shipped.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Scope decision: SpatialMem core does NOT do perception. It ships only the BYO
seam (PerceptionAdapter protocol) and stays numpy-only. All concrete perception
moves to a separate companion repo, spatialmem-perception.

Removed from core (relocating to spatialmem-perception):
- src/spatialmem/geometry.py + tests (camera->world OBB lift)
- ImageEncoder protocol + OpenClipEncoder.encode_image (image-crop encoding);
  OpenClipEncoder reverts to text-only (query encoder, the memory-side use)
- docs/design/cosmos3-perception-adapter.md (adapter design)

Kept in core: PerceptionAdapter protocol (the seam), Encoder/OpenClipEncoder
(text, for query), CosmosReasonVerbalizer (LLM answer brain, not perception),
SpatialMemTools (C3), the brain design doc.

Docs: DEV-PLAN Phase C reframed as the companion-repo backlog; brain doc points
perception pieces at spatialmem-perception.

132 tests green, ruff clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…panion)

Follow-up to the perception split (b838f55, which only carried the file
deletions). Apply the code/doc edits:
- encoders.py: remove ImageEncoder protocol + OpenClipEncoder.encode_image;
  OpenClipEncoder reverts to text-only (query encoder)
- __init__.py: drop ImageEncoder export + import
- DEV-PLAN.md: Phase C reframed as companion repo spatialmem-perception backlog
- brain design doc: perception pieces point at spatialmem-perception

132 tests green, ruff clean (core has no perception code left).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Consistent with the perception split: core ships only protocol seams + memory;
concrete external-model integrations live in companion repos.

Removed from core (relocated to spatialmem-brain):
- src/spatialmem/cosmos.py (CosmosReasonVerbalizer) + tests + examples/04
- docs/design/cosmos3-spatialmem-llm-brain.md (the system design)

Kept in core: the Verbalizer protocol + answer() seam (BYO LLM), and
SpatialMemTools (C3 memory-as-tools — a memory API surface, not brain).

Docs (README / spec/API / DEV-PLAN / tools / example05) point the brain pieces
at the spatialmem-brain companion repo.

121 tests green, ruff clean (core has no LLM-integration code left).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Follow-up to a0b9d0f (which carried only the file deletions). Apply the edits:
- __init__.py: drop CosmosReasonVerbalizer export + import
- README / spec/API / DEV-PLAN: brain layer now in spatialmem-brain
- tools.py / example05: design-doc reference points to spatialmem-brain

Verbalizer protocol + answer() seam + SpatialMemTools stay in core. 121 tests
green, ruff clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Guard against maintenance commits flushing un-fused observations. add_detections()
stages observation rows in _pending (fusion deferred to commit()); a maintenance
method's own conn.commit() between add and commit used to flush those rows to disk
unfused, leaving orphan observations on a crash and a half-ingested store to any
interleaved op. Covered by _flush_pending() guards on decay/consolidate/resplit/
forget/define_region/relate/update/close.

Tests:
- decay/consolidate between add and commit -> node count + history() linkage correct
- crash-sim: second read-only connection sees 0 committed orphans on disk
- close() without commit fuses pending, no orphans
- _opt_int: @overload-typed (int default -> int) and falls back to default on
  an explicit JSON null instead of leaking None downstream (e.g. tool arg
  "k": null now uses the default k)
- store.insert_*: assert the always-present cursor.lastrowid (was int(None?))
- persist.connect: accept str | os.PathLike[str] (was str | Path)
- encoders / vec: guarded reportMissingImports ignores on the optional
  [clip]/[vec] imports (open_clip, torch, sqlite_vec)
- ruff format: tidy a pre-existing unformatted test
- CHANGELOG: Fixed entry
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant