feat(addons/objects3d): reusable 3D object detection (2D detect + depth into oriented boxes)#417
Open
salmanmkc wants to merge 11 commits into
Open
feat(addons/objects3d): reusable 3D object detection (2D detect + depth into oriented boxes)#417salmanmkc wants to merge 11 commits into
salmanmkc wants to merge 11 commits into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
adds an
objects3daddon that turns 2D detection plus the depth mesh into oriented 3D bounding boxes you can query in world space. it's the pipeline that lived inline in theobjects_3ddemo, pulled out into a reusableScriptso apps and other addons can just ask for detected 3D objects.Object3DDetectorruns the whole thing: snap the camera and depth mesh, run 2D detection, get a per-object segmentation mask, raycast the masked depth samples into world space, fit a yaw-aligned oriented box, fuse across views, and optionally draw debug wireframe boxes. each result is aDetected3DObjectwith a nearest-surface query, so you can ask for the closest point on an object, which is handy for pointing at it or placing something against it.the backends are pluggable:
gemini(open-vocabulary, needs a key) ormediapipe(on-device COCO, no key, fixed class set).@huggingface/transformers) or the mediapipe segmenter.it also exports the pure helpers that are useful on their own:
uvToNdc(the snapshot-vs-camera aspect correction),box2dIoU/unionDetections/snapBoxToFloor(2D fusion and floor snapping), and the labelcategorizehelpers (flat / surface / tiny-flat / light buckets).the heavy deps (
@huggingface/transformers, mediapipe) stay external; the one rollup change just marks transformers external so it isn't bundled.this is split out of the agenthands branch (#416), which originally grounded its pointing through this pipeline before I switched it to a lighter depth raycast. the addon ships standalone here with colocated vitest specs (label categorization, depth sampling, fusion, and the nearest-surface query); the
objects_3ddemo can move onto it as a follow-up. lint, tests and build are clean.