Skip to content

Break USB capture churn with backoff; instrument iso retire reason#433

Open
patrickrb wants to merge 1 commit into
devfrom
fix/usb-capture-churn
Open

Break USB capture churn with backoff; instrument iso retire reason#433
patrickrb wants to merge 1 commit into
devfrom
fix/usb-capture-churn

Conversation

@patrickrb

Copy link
Copy Markdown
Owner

Problem (from the crashing device)

Pulled debug.log + logcat from a Pixel 8 running with a C-Media USB adapter (0D8C:0012). Two field failures, both rooted in the libusb-direct capture path:

1. Waterfall freeze — USB capture thrashes. Each libusb capture session delivered ~100 ms of audio, then retired all its isochronous IN transfers (code=0), and MicRecorder restarted it. Over one session: 127 opens, 88 stop/restart cycles, every ~2 s. MicRecorder only counted a session as failed when it saw no data at all — but here a few samples always arrived first (sawData=true), so the safety net never tripped and it churned forever. Result: capture is alive only ~100 ms out of every 2000 ms (~5% duty cycle) → the waterfall barely updates and the decoder never gets a continuous window.

2. Native SIGSEGV in libusb_handle_events. That 2 s libusb_init/libusb_exit churn (plus a leaked context per cycle, and concurrent TX contexts) races libusb global state:

signal 11 (SIGSEGV) ... in libft8af.so
  #06 libusb_handle_events_timeout_completed+256

Fix

  • UsbCaptureRetryPolicy (new, pure-logic, unit-tested): a session counts as a failure if it saw no data or stayed alive too briefly to be useful (< 5 s, under a third of a 15 s FT8 cycle). Reinit now uses exponential, capped backoff (2 s → 60 s) instead of a flat 2 s, and never silently falls back to the phone's built-in mic (per operator preference — radio audio or none). This cuts the init/exit churn ~15–30× and greatly reduces the crash rate.
  • MicRecorder tracks per-session alive time; the failure tally resets only when a session was actually useful (not on the first sample), and it won't re-arm after a real stop.

Instrumentation (to land the targeted native fix next)

The native ft8af_usb_capture logcat tag does not reliably surface in the field, so we couldn't read why the iso transfers retire. Now:

  • usb_audio_capture.cpp records the first cause and passes it via onCaptureStopped(code): 1000+transfer_status, 2000+(-submit_err), 3000+(-handle_events_err), or 1 (retired, no terminal cause).
  • UsbAudioDevice.describeCaptureStopCode() decodes it into debug.log, e.g. code=1005 (transfer terminal status NO_DEVICE).

The next on-device run will state exactly why capture retires — the data needed for the definitive native stability fix (and, separately, to stop leaking a libusb context per cycle).

Scope / honesty

This fixes the freeze and substantially reduces the native crash by cutting churn frequency, but the SIGSEGV's root fix depends on the retire reason this PR instruments for. Follow-up will use that to make capture stable (or degrade cleanly) and eliminate the per-cycle context leak.

Tests

UsbCaptureRetryPolicyTest (isFailure / exponential+capped backoff / overflow guard), UsbAudioWriteErrorTest (capture-stop-code decode + transferStatusName). :app:assembleDebug (native compiles, all ABIs) + :app:testDebugUnitTest green.

🤖 Generated with Claude Code

Two field failures on a Pixel 8 + C-Media (0D8C:0012) libusb-direct capture:

  1. Freeze: each capture session delivered ~100ms of audio then retired all
     its iso transfers (code=0). MicRecorder only counted a session as failed
     when it saw NO data, so this saw-a-little-then-died mode never tripped the
     tally and reinitialized a fresh libusb_init/exit every 2s forever —
     starving the waterfall to a ~5% duty cycle.
  2. Native SIGSEGV in libusb_handle_events: that 2s init/exit churn races
     libusb global state (and leaks a context per cycle).

Fixes:
  - UsbCaptureRetryPolicy (extracted, unit-tested): a session counts as a
    failure if it saw no data OR stayed alive too briefly to be useful
    (< 5s, under a third of an FT8 cycle). Reinit uses exponential, capped
    backoff (2s..60s) instead of a flat 2s, and we never silently fall back
    to the phone's built-in mic — an operator wants radio audio or none.
    This cuts the init/exit churn ~15-30x, greatly reducing the crash rate.
  - MicRecorder tracks per-session alive time and resets the failure tally
    only when a session was actually useful (not on first sample).

Instrumentation to pin the root cause (the native ft8af_usb_capture logcat
tag does not reliably surface in the field):
  - usb_audio_capture.cpp records WHY the event loop ended (first cause wins)
    and passes it via onCaptureStopped(code): 1000+transfer_status,
    2000+(-submit_err), 3000+(-handle_events_err), or 1 (retired, no cause).
  - UsbAudioDevice.describeCaptureStopCode() decodes it into debug.log, so the
    next on-device run states exactly why iso capture retires — the data
    needed to land the targeted native stability fix.

Tests: UsbCaptureRetryPolicyTest, UsbAudioWriteErrorTest (capture-stop-code +
transferStatusName cases). assembleDebug (native compiles) + testDebugUnitTest
green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codecov

codecov Bot commented Jul 5, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 21.57%. Comparing base (f23faaa) to head (12c8804).

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##                dev     #433      +/-   ##
============================================
+ Coverage     21.47%   21.57%   +0.09%     
- Complexity      133      140       +7     
============================================
  Files           150      154       +4     
  Lines         19755    19945     +190     
  Branches       2909     2947      +38     
============================================
+ Hits           4243     4303      +60     
- Misses        15344    15474     +130     
  Partials        168      168              
Flag Coverage Δ
android 12.42% <ø> (+0.26%) ⬆️
desktop 39.86% <ø> (ø)
ios 96.38% <ø> (ø)
native 9.93% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.
see 5 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant