Summary
During every periodic zigpy network backup, load_network_info walks the
coordinator's route table via repeated xncp_get_route_table_entry calls. This XNCP
burst exceeds the ASH ACK-timeout budget (ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT),
drops the NCP connection, and ZHA reinitialises into a crash loop (~8 s cycles) until
zigpy's backup retry backs off. Setting backup_enabled: false stops it completely.
Related prior report (different trigger, same error code): #617
Versions
- bellows 0.49.1
- zigpy 1.4.1
- zha 1.3.1 / zhaquirks 1.2.0
- Home Assistant Core 2026.5.4, Home Assistant OS 17.x
- Python 3.14
Hardware / firmware
- Home Assistant Connect ZBT-2 (EFR32MG24), EmberZNet 7.5.1.0, USB-attached
- ~25 Zigbee devices, channel 15
- Also reproduced identically on the HA Yellow's internal EFR32MG21 radio (different
silicon, different transport — internal UART) before migrating to the ZBT-2. The
fault is host-side: same failure on two different chips and two different transports.
Feature flag context
The route-table walk only runs when FirmwareFeatures.RESTORE_ROUTE_TABLE is present
in self._ezsp._xncp_features. On ZBT-2 / EmberZNet 7.5.1.0 this feature IS
advertised, so the code path runs. EmberZNet's source-route table holds up to 200
entries, so the walk can issue ~200 XNCP customFrame calls in rapid succession —
which is what overwhelms the ASH link.
Problem description
ZHA starts and runs normally. When the periodic backup fires, create_backup →
load_network_info(load_devices=True) walks the route table via
xncp_get_route_table_entry(index=index) and the ASH link exceeds its max
ACK-timeout count:
Backup-path traceback:
WARNING [zigpy.backups] Failed to create a network backup
zigpy/backups.py _backup_loop → create_backup
zigpy/backups.py create_backup → load_network_info(load_devices=...)
bellows/zigbee/application.py:438 → ezsp.xncp_get_route_table_entry(index=index)
bellows/ezsp/init.py:832 → send_xncp_frame(GetRouteTableEntryReq(index=index))
bellows/ezsp/init.py:749 → customFrame(...)
bellows/ezsp/protocol.py:124 → send_data
bellows/uart.py:26 → send_data
bellows/ash.py:683 _send_data_frame → await ack_future
bellows.ash.NcpFailure
Companion failure during reinit (pre_permit / setPolicy), same ASH layer:
ERROR [homeassistant] Error doing job: Task exception was never retrieved
bellows/ezsp/v8/init.py:50 pre_permit → setPolicy
bellows/ezsp/protocol.py:124 → send_data
bellows/uart.py:26 → send_data
bellows/ash.py:735 send_data → asyncio.shield(...)
bellows/ash.py:660 _send_data_frame → raise NcpFailure(ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT)
bellows.ash.NcpFailure: NcpResetCode.ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT
Steps to reproduce
- ZBT-2 on EmberZNet 7.5.1.0 with
FirmwareFeatures.RESTORE_ROUTE_TABLE advertised.
- ZHA running normally; devices communicate without issue.
- Wait for the periodic network backup to fire (~first occurrence ~76 s after startup).
- Backup fails as above; NCP connection drops; ZHA reinitialises and re-fires the
backup → repeating ~8 s crash loop, CPU 60–90%.
Expected vs actual
Expected: route-table walk completes or fails gracefully without losing the NCP
connection.
Actual: ASH ACK-timeout NcpFailure, connection drop, reinitialisation loop.
What I've already ruled out
- Not a duplicate coordinator (a prior
ROUTE_ERROR_ADDRESS_CONFLICT storm from a
second radio on the same PAN was a separate fault, now resolved).
zigbee.db PRAGMA integrity_check → ok.
- Not host storage/recorder: DB on SSD, CPU is low except during this loop.
- Single accessor: one ZHA instance, USB-attached ZBT-2, no Z2M or CLI tool on the
port.
Workaround
zha:
zigpy_config:
backup_enabled: false
This prevents load_network_info from being called by the backup loop and stops the
crash entirely.
Additional context
Summary
During every periodic zigpy network backup,
load_network_infowalks thecoordinator's route table via repeated
xncp_get_route_table_entrycalls. This XNCPburst exceeds the ASH ACK-timeout budget (
ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT),drops the NCP connection, and ZHA reinitialises into a crash loop (~8 s cycles) until
zigpy's backup retry backs off. Setting
backup_enabled: falsestops it completely.Related prior report (different trigger, same error code): #617
Versions
Hardware / firmware
silicon, different transport — internal UART) before migrating to the ZBT-2. The
fault is host-side: same failure on two different chips and two different transports.
Feature flag context
The route-table walk only runs when
FirmwareFeatures.RESTORE_ROUTE_TABLEis presentin
self._ezsp._xncp_features. On ZBT-2 / EmberZNet 7.5.1.0 this feature ISadvertised, so the code path runs. EmberZNet's source-route table holds up to 200
entries, so the walk can issue ~200 XNCP
customFramecalls in rapid succession —which is what overwhelms the ASH link.
Problem description
ZHA starts and runs normally. When the periodic backup fires,
create_backup→load_network_info(load_devices=True)walks the route table viaxncp_get_route_table_entry(index=index)and the ASH link exceeds its maxACK-timeout count:
Backup-path traceback:
WARNING [zigpy.backups] Failed to create a network backup
zigpy/backups.py _backup_loop → create_backup
zigpy/backups.py create_backup → load_network_info(load_devices=...)
bellows/zigbee/application.py:438 → ezsp.xncp_get_route_table_entry(index=index)
bellows/ezsp/init.py:832 → send_xncp_frame(GetRouteTableEntryReq(index=index))
bellows/ezsp/init.py:749 → customFrame(...)
bellows/ezsp/protocol.py:124 → send_data
bellows/uart.py:26 → send_data
bellows/ash.py:683 _send_data_frame → await ack_future
bellows.ash.NcpFailure
Companion failure during reinit (
pre_permit/setPolicy), same ASH layer:ERROR [homeassistant] Error doing job: Task exception was never retrieved
bellows/ezsp/v8/init.py:50 pre_permit → setPolicy
bellows/ezsp/protocol.py:124 → send_data
bellows/uart.py:26 → send_data
bellows/ash.py:735 send_data → asyncio.shield(...)
bellows/ash.py:660 _send_data_frame → raise NcpFailure(ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT)
bellows.ash.NcpFailure: NcpResetCode.ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT
Steps to reproduce
FirmwareFeatures.RESTORE_ROUTE_TABLEadvertised.backup → repeating ~8 s crash loop, CPU 60–90%.
Expected vs actual
Expected: route-table walk completes or fails gracefully without losing the NCP
connection.
Actual: ASH ACK-timeout
NcpFailure, connection drop, reinitialisation loop.What I've already ruled out
ROUTE_ERROR_ADDRESS_CONFLICTstorm from asecond radio on the same PAN was a separate fault, now resolved).
zigbee.dbPRAGMA integrity_check→ok.port.
Workaround
This prevents
load_network_infofrom being called by the backup loop and stops thecrash entirely.
Additional context
host-side, not the radio silicon or transport layer.