Skip to content

fix: make V1Channel re-subscribable after a failed subscribe#845

Open
pike00 wants to merge 1 commit into
Python-roborock:mainfrom
pike00:fix/v1-channel-resubscribe
Open

fix: make V1Channel re-subscribable after a failed subscribe#845
pike00 wants to merge 1 commit into
Python-roborock:mainfrom
pike00:fix/v1-channel-resubscribe

Conversation

@pike00

@pike00 pike00 commented Jun 9, 2026

Copy link
Copy Markdown

Summary

A device whose first connect attempt hits a transient error is bricked for the rest of the session: it never gets entities and the log shows the misleading ValueError: Only one subscription allowed at a time. This was observed with two V1 vacuums on one account — the second vacuum (sharing the MQTT session) reliably failed while the first connected fine.

Root cause

The connect path is RoborockDevice.start_connect()connect_loop()connect()V1Channel.subscribe().

  1. V1Channel.subscribe() set self._callback = callback last, and the returned unsub() closure tore down the reconnect task and MQTT/local subscriptions but never cleared self._callback.
  2. In connect(), if v1_properties.start() raised a transient RoborockException (e.g. the initial status fetch timing out for a second device contending for the MQTT session), the except RoborockException block called unsub() and re-raised — leaving self._callback still set.
  3. connect_loop() caught the RoborockException and retried connect(). The retry's subscribe() saw self._callback is not None and raised ValueError("Only one subscription allowed at a time"). That ValueError is not a RoborockException, so it escaped to connect_loop()'s outer except Exception, which logs Uncaught error during connect and returns — permanently giving up. No ready callbacks fire, so no entities are created.

A secondary leak: if subscribe() raised partway through (after starting _reconnect_task / setting _mqtt_unsub), it returned no unsub, so the reconnect task and subscriptions leaked on every retry.

Fix

  • V1Channel.subscribe() now claims self._callback up front and wraps the local-connect / reconnect-task / MQTT-subscribe setup in try/except BaseException: self._teardown(); raise.
  • The teardown logic is factored into an idempotent _teardown() (also returned as the unsub function) that cancels the reconnect task, drops the MQTT/local subscriptions, closes the local channel, and clears self._callback — leaving the channel cleanly re-subscribable.
  • RoborockDevice.connect()'s cleanup is broadened from except RoborockException to except BaseException so any start() failure (not just Roborock errors) releases the channel before propagating.

Net effect: a retry after a transient first-attempt failure now re-subscribes cleanly instead of tripping the stale-callback guard. A genuinely persistent failure becomes a normal backoff-retry loop surfacing the real error, rather than a hard brick behind a misleading ValueError.

Tests

Added regression tests (all fail against the pre-fix code):

  • tests/devices/rpc/test_v1_channel.py
    • subscribe → unsub → subscribe no longer raises.
    • a failure during MQTT subscribe leaves _reconnect_task is None and the channel re-callable (no leaked task/subscription).
  • tests/devices/test_v1_device.py
    • connect() unsubscribes when start() fails (parametrized: RoborockException and a non-Roborock error).
    • end-to-end: start() raises a transient RoborockException once then succeeds → connect_loop() reconnects and the device becomes ready, with no ValueError.

uv run pytest — full suite passes; ruff check and mypy clean.

Copilot AI review requested due to automatic review settings June 9, 2026 12:47

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Fixes a regression where transient failures during device/channel setup could leave subscriptions partially active, preventing subsequent reconnects (e.g., “Only one subscription allowed at a time”).

Changes:

  • Make V1Channel.subscribe() teardown atomic and reusable via a shared _teardown() path.
  • Ensure RoborockDevice.connect() always unsubscribes if start() fails (including non-RoborockException errors).
  • Add regression tests covering resubscribe/retry and atomic failure cleanup.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
roborock/devices/rpc/v1_channel.py Makes subscription setup atomic and introduces _teardown() to reliably reset internal state on error/unsubscribe.
roborock/devices/device.py Broadens cleanup on start() failure to always unsubscribe before re-raising.
tests/devices/test_v1_device.py Adds regression tests ensuring connect() unsubscribes on any start() failure and can retry cleanly.
tests/devices/rpc/test_v1_channel.py Adds tests for subscribe→unsub→subscribe and for atomic cleanup after subscribe failures.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +205 to 210
except BaseException:
# Any failure in start() must unsubscribe before propagating, so a
# retry by connect_loop() gets a clean channel. Broader than
# RoborockException so non-Roborock errors also release the channel.
unsub()
raise
Comment on lines +328 to +332
except BaseException:
# Any failure during setup must leave the channel re-subscribable:
# cancel the reconnect task, drop subscriptions, and clear _callback.
self._teardown()
raise
Comment on lines +344 to +357
if self._reconnect_task:
self._reconnect_task.cancel()
self._reconnect_task = None
if self._mqtt_unsub:
self._mqtt_unsub()
self._mqtt_unsub = None
if self._local_unsub:
self._local_unsub()
self._local_unsub = None
if self._local_channel:
self._local_channel.close()
self._local_channel = None
self._callback = None
self._logger.debug("Unsubscribed from device")

v1_channel = V1Channel(
device_uid=duid,
security_data=SecurityData(endpoint="test_endpoint", nonce=b"test_nonce_16byte"),
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants