Break USB capture churn with backoff; instrument iso retire reason#433
Open
patrickrb wants to merge 1 commit into
Open
Break USB capture churn with backoff; instrument iso retire reason#433patrickrb wants to merge 1 commit into
patrickrb wants to merge 1 commit into
Conversation
Two field failures on a Pixel 8 + C-Media (0D8C:0012) libusb-direct capture:
1. Freeze: each capture session delivered ~100ms of audio then retired all
its iso transfers (code=0). MicRecorder only counted a session as failed
when it saw NO data, so this saw-a-little-then-died mode never tripped the
tally and reinitialized a fresh libusb_init/exit every 2s forever —
starving the waterfall to a ~5% duty cycle.
2. Native SIGSEGV in libusb_handle_events: that 2s init/exit churn races
libusb global state (and leaks a context per cycle).
Fixes:
- UsbCaptureRetryPolicy (extracted, unit-tested): a session counts as a
failure if it saw no data OR stayed alive too briefly to be useful
(< 5s, under a third of an FT8 cycle). Reinit uses exponential, capped
backoff (2s..60s) instead of a flat 2s, and we never silently fall back
to the phone's built-in mic — an operator wants radio audio or none.
This cuts the init/exit churn ~15-30x, greatly reducing the crash rate.
- MicRecorder tracks per-session alive time and resets the failure tally
only when a session was actually useful (not on first sample).
Instrumentation to pin the root cause (the native ft8af_usb_capture logcat
tag does not reliably surface in the field):
- usb_audio_capture.cpp records WHY the event loop ended (first cause wins)
and passes it via onCaptureStopped(code): 1000+transfer_status,
2000+(-submit_err), 3000+(-handle_events_err), or 1 (retired, no cause).
- UsbAudioDevice.describeCaptureStopCode() decodes it into debug.log, so the
next on-device run states exactly why iso capture retires — the data
needed to land the targeted native stability fix.
Tests: UsbCaptureRetryPolicyTest, UsbAudioWriteErrorTest (capture-stop-code +
transferStatusName cases). assembleDebug (native compiles) + testDebugUnitTest
green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## dev #433 +/- ##
============================================
+ Coverage 21.47% 21.57% +0.09%
- Complexity 133 140 +7
============================================
Files 150 154 +4
Lines 19755 19945 +190
Branches 2909 2947 +38
============================================
+ Hits 4243 4303 +60
- Misses 15344 15474 +130
Partials 168 168
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem (from the crashing device)
Pulled
debug.log+ logcat from a Pixel 8 running with a C-Media USB adapter (0D8C:0012). Two field failures, both rooted in the libusb-direct capture path:1. Waterfall freeze — USB capture thrashes. Each libusb capture session delivered ~100 ms of audio, then retired all its isochronous IN transfers (
code=0), andMicRecorderrestarted it. Over one session: 127 opens, 88 stop/restart cycles, every ~2 s.MicRecorderonly counted a session as failed when it saw no data at all — but here a few samples always arrived first (sawData=true), so the safety net never tripped and it churned forever. Result: capture is alive only ~100 ms out of every 2000 ms (~5% duty cycle) → the waterfall barely updates and the decoder never gets a continuous window.2. Native SIGSEGV in
libusb_handle_events. That 2 slibusb_init/libusb_exitchurn (plus a leaked context per cycle, and concurrent TX contexts) races libusb global state:Fix
UsbCaptureRetryPolicy(new, pure-logic, unit-tested): a session counts as a failure if it saw no data or stayed alive too briefly to be useful (< 5 s, under a third of a 15 s FT8 cycle). Reinit now uses exponential, capped backoff (2 s → 60 s) instead of a flat 2 s, and never silently falls back to the phone's built-in mic (per operator preference — radio audio or none). This cuts theinit/exitchurn ~15–30× and greatly reduces the crash rate.MicRecordertracks per-session alive time; the failure tally resets only when a session was actually useful (not on the first sample), and it won't re-arm after a real stop.Instrumentation (to land the targeted native fix next)
The native
ft8af_usb_capturelogcat tag does not reliably surface in the field, so we couldn't read why the iso transfers retire. Now:usb_audio_capture.cpprecords the first cause and passes it viaonCaptureStopped(code):1000+transfer_status,2000+(-submit_err),3000+(-handle_events_err), or1(retired, no terminal cause).UsbAudioDevice.describeCaptureStopCode()decodes it intodebug.log, e.g.code=1005 (transfer terminal status NO_DEVICE).The next on-device run will state exactly why capture retires — the data needed for the definitive native stability fix (and, separately, to stop leaking a libusb context per cycle).
Scope / honesty
This fixes the freeze and substantially reduces the native crash by cutting churn frequency, but the SIGSEGV's root fix depends on the retire reason this PR instruments for. Follow-up will use that to make capture stable (or degrade cleanly) and eliminate the per-cycle context leak.
Tests
UsbCaptureRetryPolicyTest(isFailure / exponential+capped backoff / overflow guard),UsbAudioWriteErrorTest(capture-stop-code decode +transferStatusName).:app:assembleDebug(native compiles, all ABIs) +:app:testDebugUnitTestgreen.🤖 Generated with Claude Code