Skip to content

Generative objects: summon AI-generated objects onto the surfaces around you#405

Open
salmanmkc wants to merge 29 commits into
google:mainfrom
salmanmkc:feat/generative-object
Open

Generative objects: summon AI-generated objects onto the surfaces around you#405
salmanmkc wants to merge 29 commits into
google:mainfrom
salmanmkc:feat/generative-object

Conversation

@salmanmkc

@salmanmkc salmanmkc commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

adds a GenerativeObjects primitive (xb.core.generative.imagine(prompt)) that turns a text/voice prompt into a placed, draggable object in your space. gemini generates an image, we key out the plain background into a clean cutout, and drop it onto the real-world surface you're looking at, occluded by depth and facing you.

fills a gap: image gen existed only as a low-level call (Gemini.generate), nothing turned a prompt into a placed, interactive object. it's the runtime verb the gem/canvas can compose but can't synthesize itself, ("create a thing you pinch and drag") as a one-liner, which I believe Ruofei wanted in the past

what's in it

primitive (src/generative/):

  • GenerativeObjects script + imagine(prompt, opts) places a GenerativeObject on the surface you're looking at. resolves null if AI is unavailable.
  • cutout: alpha-key the plain background so the subject reads cleanly, not as a white card.
  • grounding: raycast the depth mesh, stand objects on floors/tables, float them a few cm off walls so they don't blend in. falls back to in-front-of-camera with no hit.
  • occlusion: opts the material into the occlusion shader so real geometry hides it (the layer alone only builds the mask).
  • upright (yaw-only) billboard so it faces you like a standee.
  • draggable (translating) via the global DragManager.
  • optional experimental 2.5d relief (off by default): displaces a subdivided plane by the image brightness. approximate (brightness != depth), kept opt-in.

wired into Core/Options via enableGenerativeObjects() and exported from xrblocks.ts. pure helpers (scale, placement, facing, background keying) are unit-tested; 45 colocated tests, full suite green, build/lint/prettier clean.

demo (demos/generative_object/):

  • summon with on-screen buttons, a draggable spatial uiblocks panel that head-leashes to follow you, or your voice (push-to-talk).
  • gemini key entry overlay (?key= > localStorage > keys.json > prompt), same pattern as world_companion.

try it

serve the repo, open demos/generative_object/index.html, paste a gemini key, hit summon (or speak). objects land on the surface you're looking at; grab to move.

notes

  • client-side gen + ?key= is prototyping-only, same caveat as the other AI demos.
  • 2.5d relief is experimental/opt-in (toggle in the demo); luminance != real depth so it's approximate. real relief would want a monocular depth model, future work.
  • grounding/occlusion need depth enabled; without it, placement falls back to in-front-of-camera and occlusion is skipped.

salmanmkc added 23 commits June 23, 2026 12:45
A prompt becomes a placed, draggable object: GenerativeObjects.imagine()
asks the AI image model to generate an image, decodes it into a texture,
and drops a billboard into the scene in front of the user, occludable by
real-world depth. Pure helpers (aspect-preserving scale, place-in-front
pose) and the orchestration are unit-tested with mocked AI + texture
source. Wired into Core/Options via enableGenerativeObjects() and exported
from xrblocks.ts.
Add keyOutBackground (pure, tested) and a browser CanvasBackgroundTextureSource
that decodes the generated image, keys out the plain background, and returns a
CanvasTexture so the subject reads as a cutout rather than a flat card. Gated by
GenerativeOptions.removeBackground (on by default); Core swaps in the canvas
source when enabled.
Speak or pinch to summon an AI-generated object into your space via
xb.core.generative.imagine(); generated subjects are keyed to cutouts,
placed in front of you, draggable, and occluded by real depth. Voice
trigger via SpeechRecognizer; pinch cycles preset prompts so it works
without a mic.
Add enableGenerativeObjects() to the options list, an xb.core.generative.imagine
usage snippet, and a generative/ directory-map entry.
Add @google/genai to the importmap (the demo failed to init Gemini without
it) and a key-entry overlay that resolves the key from ?key= > localStorage >
keys.json > a prompt, matching the world_companion/objects_3d demos.
Every pinch/click summoned a new object, which fought with grabbing an
existing one. Track whether a select started on an existing generative
object and, if so, let DragManager move it instead of summoning. Add a
keyboard 'G' summon for desktop where dragging uses the mouse.
Add a quaternionFacingCamera helper and a GenerativeObjects.update() that
turns tracked objects to face the user each frame, gated by the new
GenerativeOptions.billboard flag (on by default). Keeps the flat cutout from
looking paper-thin from the side. Pure helper + billboard behavior unit-tested.
Add a netblocks-styled 🎙️ push-to-talk button (speech was undiscoverable
before) and move the status HUD to the top-left so it no longer collides
with the simulator's settings gear.
Add an opt-in GenerativeOptions.relief that builds the object as a densely
subdivided plane displaced by the generated image's brightness (three.js
displacementMap + bumpMap on a lit standard material) instead of a flat
cutout, giving real shaded surface relief. Approximate (brightness is not
true depth) and needs a light in the scene; default off. Structure unit-tested.
Press R to switch subsequently summoned objects between flat cutout and
2.5D relief (pausing billboarding so you can orbit the relief), and add
ambient + directional lights so the lit relief material shows shading.
Raycast the camera forward against the depth mesh and place the object
there: stand it on horizontal surfaces, float it off vertical ones so it
doesn't blend into walls, falling back to in-front-of-camera with no hit.
Also opt the material into the occlusion shader (the layer alone only
builds the mask) so it's hidden behind real geometry.
Add a draggable uiblocks control panel (summon/speak/relief/clear) that
head-leashes to follow the user, plus a top-right on-screen button bar and a
push-to-talk voice button. Summoning is now via the controls/voice only
(removed click-to-spawn), enable spatial UI + the depth texture for
occlusion, and use the 'flare' icon for summon.
draggable=true alone wasn't enough: DragManager.beginDragging bails when
there's no draggingMode, so grabbing never started. Set draggingMode to
TRANSLATING.
@dli7319 dli7319 self-requested a review June 26, 2026 22:19
@ruofeidu ruofeidu self-assigned this Jun 26, 2026
@ruofeidu ruofeidu added the demo New demo for XR Blocks demonstrating novel interactivity or perception features. label Jun 26, 2026
@ruofeidu

Copy link
Copy Markdown
Collaborator

Hi Salman,

Thank you for your contribution in this!!! I would like to request to switch to a demo.
Also the demo ignores keys.json file I used to debug locally.

I won't say this is ready to be put inside the SDK (for now).
Even for inside SDK, it should be under ai.generateBillboard(image), etc.

We need to carefully think of the high-level picture of generativeAssets:

abstract --------> photorealistic
photo vs. mesh vs. 3D Gaussians etc.
LLM / local model vs. cloud models

Internally, we have a demo like this, but with better quality & confidential tech :)

@ruofeidu ruofeidu marked this pull request as draft June 26, 2026 22:31
@ruofeidu ruofeidu self-requested a review June 26, 2026 22:31
@ruofeidu ruofeidu added the algorithm spatial algorithm label Jun 26, 2026
@ruofeidu ruofeidu removed the algorithm spatial algorithm label Jun 26, 2026
@ruofeidu ruofeidu removed this from Agent Blocks Jun 26, 2026
@salmanmkc

Copy link
Copy Markdown
Contributor Author

Hi Salman,

Thank you for your contribution in this!!! I would like to request to switch to a demo. Also the demo ignores keys.json file I used to debug locally.

I won't say this is ready to be put inside the SDK (for now). Even for inside SDK, it should be under ai.generateBillboard(image), etc.

We need to carefully think of the high-level picture of generativeAssets:

abstract --------> photorealistic photo vs. mesh vs. 3D Gaussians etc. LLM / local model vs. cloud models

Internally, we have a demo like this, but with better quality & confidential tech :)

you sure the keys.json wasn't in the repo root or the wrong folder? should be relative

and sure will move this to demo only, wow that's cool to know there's a better internal demo, is it sorta similar to likeness level quality?

Per review on google#405, the generative objects feature moves out of the SDK and
becomes a demo. Remove the xb.core.generative subsystem, the Options
enableGenerativeObjects()/GenerativeOptions, the barrel exports for the
orchestrator/object/options/texture-source, and the SKILL.md references.

BackgroundKeyer (pure RGBA chroma-key) and GenerativeObjectUtils (generic
billboard/face-camera math) stay in src/ as small, unit-tested helpers the demo
imports; the orchestration moves to the demo in the next commit.
The generative objects orchestration now lives in demos/generative_object/src/
(GenerativeObjects, GenerativeObject, GenerativeOptions, TextureSource) built by
rollup like the drone/animalattack demos, instead of the SDK. The demo owns a
GenerativeObjects script and adds it via xb.add() so dependency injection still
resolves AI/camera/scene/depth.

Fixes carried over from the review while moving:
- only wire depth occlusion when depth is actually present, so objects don't
  render transparent against an empty occlusion map when depth is off
- prefer the depth mesh's geometric face normal for surface orientation (the
  per-vertex normals are not kept fresh) and update the full-resolution mesh so
  placement raycasts hit current geometry
- a generation token so an in-flight generate that resolves after clearObjects()
  is discarded instead of adding a stale object
- build the relief displacement map lazily and dispose every distinct texture

Also splits generation into generateBillboard(image), the image-to-object half,
to sketch where an SDK ai.generateBillboard(image) could sit, and loads the
built src/build/main.js with the keys.json root fallback.
Two reasons a summoned object could be hard to see:

- groundOnSurface raycast placed it on whatever surface was ahead, so looking
  across the room dropped it on a far wall, tiny and easy to miss. Cap the
  grounding distance (maxGroundDistance, 2 m) and fall back to in-front
  placement when the surface is farther.

- the image prompt asked for a white background, which the background keyer then
  cut out along with pale subjects (a white paper airplane vanished). Ask for a
  saturated chroma-green background instead so the corner-sampled keyer keeps any
  non-green subject.
@salmanmkc salmanmkc marked this pull request as ready for review June 27, 2026 05:42
The control buttons' idle/hover fill colors (#2a2a2a -> #3a3a3a) differed
by only a few percent of brightness, so hovering produced no perceptible
change. Use a dark chip for idle and a clear purple for hover (with a
brighter click flash), matching the agent_hands demo.
…panel

The depth mesh is in the scene for occlusion and surface placement, so
the reticle's whole-scene raycast also hits it; standing within ~1m of a
wall makes it the closest hit and grabs hover from the control panel.

No-op the depth mesh's raycast so the reticle skips it, and restore the
real raycast briefly inside raycastSurface_ so object placement still
grounds on the geometry. Same approach as the agent_hands demo.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

demo New demo for XR Blocks demonstrating novel interactivity or perception features.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants