Skip to content

feat(addons/objects3d): reusable 3D object detection (2D detect + depth into oriented boxes)#417

Open
salmanmkc wants to merge 11 commits into
google:mainfrom
salmanmkc:feat/objects3d
Open

feat(addons/objects3d): reusable 3D object detection (2D detect + depth into oriented boxes)#417
salmanmkc wants to merge 11 commits into
google:mainfrom
salmanmkc:feat/objects3d

Conversation

@salmanmkc

Copy link
Copy Markdown
Contributor

adds an objects3d addon that turns 2D detection plus the depth mesh into oriented 3D bounding boxes you can query in world space. it's the pipeline that lived inline in the objects_3d demo, pulled out into a reusable Script so apps and other addons can just ask for detected 3D objects.

Object3DDetector runs the whole thing: snap the camera and depth mesh, run 2D detection, get a per-object segmentation mask, raycast the masked depth samples into world space, fit a yaw-aligned oriented box, fuse across views, and optionally draw debug wireframe boxes. each result is a Detected3DObject with a nearest-surface query, so you can ask for the closest point on an object, which is handy for pointing at it or placing something against it.

the backends are pluggable:

  • 2D detection: gemini (open-vocabulary, needs a key) or mediapipe (on-device COCO, no key, fixed class set).
  • masks: SAM / slimsam (via @huggingface/transformers) or the mediapipe segmenter.

it also exports the pure helpers that are useful on their own: uvToNdc (the snapshot-vs-camera aspect correction), box2dIoU / unionDetections / snapBoxToFloor (2D fusion and floor snapping), and the label categorize helpers (flat / surface / tiny-flat / light buckets).

the heavy deps (@huggingface/transformers, mediapipe) stay external; the one rollup change just marks transformers external so it isn't bundled.

this is split out of the agenthands branch (#416), which originally grounded its pointing through this pipeline before I switched it to a lighter depth raycast. the addon ships standalone here with colocated vitest specs (label categorization, depth sampling, fusion, and the nearest-surface query); the objects_3d demo can move onto it as a follow-up. lint, tests and build are clean.

@ruofeidu ruofeidu requested a review from nsalminen June 26, 2026 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants