Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
RF-DETR joins RT-DETR and YOLO as a state-of-the-art deep vision model implementation.
Why, you ask? These are all examples showcasing how threepp's Vulkan backend isn't just for drawing triangles — the same GPU device that renders your scene can run a modern neural network end-to-end, with nothing but hand-written GLSL compute shaders. No PyTorch. No CUDA. No ONNX Runtime. No TensorRT. No 2 GB of Python dependencies. Just Vulkan — the same one threepp already uses to render.
That's the whole point of these examples:
One device, one process. Perception and rendering live on the same Vulkan device and share the same memory. You can render a scene and detect objects in it without ever leaving the GPU or shelling out to a separate inference framework.
Runs anywhere Vulkan runs. NVIDIA, AMD, Intel, mobile Mali/Adreno — no vendor lock-in. (TensorRT is NVIDIA-only; this isn't.)
It's the real model, not a toy. Each one is the actual pretrained network, reimplemented op-by-op as compute shaders.
What's in here
Three object detectors spanning both major families:
Each ships as a threepp example: load weights, run on an image, and visualize the detections through the renderer's ortho overlay.
Is it correct?
Yes, and provably so. Each port is validated layer-by-layer against the reference PyTorch model — per-layer activations are captured from PyTorch and diffed element-wise (the backbones match to ~1e-5). On real images the detections match the reference model (e.g. bus.jpg → bus + 4 people, same boxes). There's a --validate mode in each example so you can check it yourself.
Is it fast?
Competitive. On an RTX 4070, end-to-end (preprocess → forward → postprocess), benchmarked honestly against optimized PyTorch (TorchScript, lean pre/post):
YOLOv8n: ~127 FPS — on par with optimized PyTorch.
RF-DETR-Nano: ~42 FPS — within ~10% of optimized PyTorch.
To be clear about what this is and isn't: hand-written shaders don't beat vendor-tuned cuBLAS/cuDNN on raw compute, and that's fine — the win here is deployment: framework-free, portable, and co-resident with the renderer. PyTorch baselines are included (scripts/bench_*) so the comparison is reproducible.