I think i found a new issue introduced by the timeline semaphores.
If i start the theater mode, steamvr reliably crashes once i try to start it..
If i enable the vulkan validation layers, it crashes before compositing the first frame (display is gray, then crash with -203 as error code).. So probably some kind of race condition which is timing sensitive.
Summary
When launching theater mode, the vrcompositor crashes reliably with a GPU page fault in the RenderThread. The crash chain involves multiple Vulkan API specification violations by vrcompositor, caught by VK_LAYER_KHRONOS_validation.
System Information
- SteamVR version: 2.17.2 (build 1781214772)
- Distribution: Ubuntu 26.04 LTS
- Kernel:
7.1.0-rc4-spacy2026052101 (custom, kernel.org upstream + config changes)
- GPU: AMD Radeon RX 7800 XT (Navi 32, RDNA3)
- Vulkan driver: Mesa 26.2.0 (RADV for NAVI32), Mesa commit
7b286abe336
- Desktop: KDE Plasma 6, Wayland
- Headset: Valve Index (direct mode via
wp_drm_lease_device_v1)
Crash Description
Sequence
- vrcompositor starts, initializes Vulkan, acquires DRM display lease via Wayland
- Theater mode is launched
- First crash (vrcompositor PID 27461):
AcquireNextImageKHR returns VK_ERROR_SURFACE_LOST_KHR (-1000000000), compositor segfaults with NULL pointer dereference (mov rax, [rsi] where RSI=0) — error handling path missing
- vrcompositor restarts (PID 28694)
- GPU page fault:
SQC (data) read at VRAM address 0x8001399000 with PERMISSION_FAULTS=0x3 (PTE exists but read access denied)
- GFX ring timeout:
ring gfx_0.0.0 timeout, signaled seq=370360, emitted seq=370362
- Watchdog abort:
Failed Watchdog timeout in thread Render in Present after 6.78s. Aborting. -> SIGABRT
Validation Layer Output (from vrcompositor-linux.txt)
1. vkAcquireNextImageKHR with busy semaphore
VUID-vkAcquireNextImageKHR-semaphore-01779
vkAcquireNextImageKHR(): Semaphore must not have any pending operations.
Semaphore 0xf600000000f6
This causes VK_ERROR_SURFACE_LOST_KHR, triggering the segfault in the error path.
2. Draw calls with uninitialized descriptors (repeated 10+ times)
VUID-vkCmdDrawIndexed-None-08114
vkCmdDrawIndexed(): the descriptor [VkDescriptorSet 0x4840000000484,
Set 0, Binding 9, Index 1] is being used in draw but has never been
updated via vkUpdateDescriptorSets() or a similar call.
VUID-vkCmdDraw-None-08114
vkCmdDraw(): the descriptor [VkDescriptorSet 0x4890000000489,
Set 0, Binding 9, Index 1] is being used in draw...
This is the direct cause of the GPU page fault: the shader reads from an uninitialized descriptor, which contains a garbage GPU VA. When the shader's SQC cache tries to read from that address, it hits a VRAM page without read access -> PERMISSION_FAULTS=0x3 -> GFX ring timeout.
3. Shader writes gl_Layer past framebuffer layer count
Undefined-Value-Layer-Written
Shader stage VK_SHADER_STAGE_VERTEX_BIT writes to Layer (gl_Layer)
but the framebuffer was created with layer count of 1
GPU Coredump (from /sys/class/drm/card1/device/devcoredump/data)
The fault is deterministic - same VRAM address 0x0000008001399000 across multiple runs:
[gfxhub] page fault (src_id:0 ring:24 vmid:5 pasid:255)
Process vrcompositor pid 28694 thread RenderThread pid 28771
in page starting at address 0x0000008001399000
Faulty UTCL2 client ID: SQC (data) (0xa)
PERMISSION_FAULTS: 0x3
MAPPING_ERROR: 0x0
RW: 0x0
Root Cause
The vrcompositor has a race condition or missing synchronization in its render loop:
-
Acquire semaphore reuse: vkAcquireNextImageKHR is called with a semaphore that still has pending signal operations from the previous vkAcquireNextImageKHR. The spec requires the semaphore to be unsignaled (no pending operations). This causes the swapchain to enter an error state.
-
Descriptors used before update: Descriptor sets are bound to the pipeline and draw commands are issued before vkUpdateDescriptorSets() is called for those descriptor sets. The GPU reads garbage descriptor data, which contains invalid GPU addresses, causing the SQC page fault.
Workaround
The crash is timing-sensitive. Switching the Mesa RADV driver version can mask the race by changing GPU scheduling behavior, but the root cause is in vrcompositor.
Attachments
vrcompositor-linux.txt
dump_steamvr_crash.txt
I think i found a new issue introduced by the timeline semaphores.
If i start the theater mode, steamvr reliably crashes once i try to start it..
If i enable the vulkan validation layers, it crashes before compositing the first frame (display is gray, then crash with -203 as error code).. So probably some kind of race condition which is timing sensitive.
Summary
When launching theater mode, the vrcompositor crashes reliably with a GPU page fault in the RenderThread. The crash chain involves multiple Vulkan API specification violations by vrcompositor, caught by VK_LAYER_KHRONOS_validation.
System Information
7.1.0-rc4-spacy2026052101(custom, kernel.org upstream + config changes)7b286abe336wp_drm_lease_device_v1)Crash Description
Sequence
AcquireNextImageKHRreturnsVK_ERROR_SURFACE_LOST_KHR(-1000000000), compositor segfaults with NULL pointer dereference (mov rax, [rsi]where RSI=0) — error handling path missingSQC (data)read at VRAM address0x8001399000withPERMISSION_FAULTS=0x3(PTE exists but read access denied)ring gfx_0.0.0 timeout, signaled seq=370360, emitted seq=370362Failed Watchdog timeout in thread Render in Present after 6.78s. Aborting.-> SIGABRTValidation Layer Output (from vrcompositor-linux.txt)
1. vkAcquireNextImageKHR with busy semaphore
This causes
VK_ERROR_SURFACE_LOST_KHR, triggering the segfault in the error path.2. Draw calls with uninitialized descriptors (repeated 10+ times)
This is the direct cause of the GPU page fault: the shader reads from an uninitialized descriptor, which contains a garbage GPU VA. When the shader's SQC cache tries to read from that address, it hits a VRAM page without read access ->
PERMISSION_FAULTS=0x3-> GFX ring timeout.3. Shader writes gl_Layer past framebuffer layer count
GPU Coredump (from /sys/class/drm/card1/device/devcoredump/data)
The fault is deterministic - same VRAM address
0x0000008001399000across multiple runs:Root Cause
The vrcompositor has a race condition or missing synchronization in its render loop:
Acquire semaphore reuse:
vkAcquireNextImageKHRis called with a semaphore that still has pending signal operations from the previousvkAcquireNextImageKHR. The spec requires the semaphore to be unsignaled (no pending operations). This causes the swapchain to enter an error state.Descriptors used before update: Descriptor sets are bound to the pipeline and draw commands are issued before
vkUpdateDescriptorSets()is called for those descriptor sets. The GPU reads garbage descriptor data, which contains invalid GPU addresses, causing the SQC page fault.Workaround
The crash is timing-sensitive. Switching the Mesa RADV driver version can mask the race by changing GPU scheduling behavior, but the root cause is in vrcompositor.
Attachments
vrcompositor-linux.txt
dump_steamvr_crash.txt