Skip to content

Remove bridge sync indexing cap#261

Merged
ewlarson merged 2 commits into
developfrom
feature/bridge-sync-index-all-changes
May 27, 2026
Merged

Remove bridge sync indexing cap#261
ewlarson merged 2 commits into
developfrom
feature/bridge-sync-index-all-changes

Conversation

@ewlarson

@ewlarson ewlarson commented May 27, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Remove the hard total cap on bridge delta Elasticsearch refreshes so every changed resource ID is handled.
  • Process bridge Elasticsearch refresh work in configurable batches of 5,000 by default.
  • Process bridge cache invalidation/generated asset refresh work in configurable batches of 5,000 by default while keeping URL rewarm bounded.
  • Keep the old BRIDGE_SEARCH_INDEX_MAX_RESOURCE_IDS and BRIDGE_CACHE_REFRESH_MAX_RESOURCE_IDS env vars as compatibility aliases for batch size, not total limits.
  • Add bridge sync report warnings for partial Elasticsearch refreshes and cache refresh failures.
  • Add tests for uncapped batched indexing/cache invalidation and the new report warnings.

Root Cause

The May 23 production bridge run imported 8,945 changed resources, but the targeted Elasticsearch refresh only processed the first 5,000 IDs. That left a tail of stale ES documents/facets even though the database rows had the corrected provider/publisher values.

Implementation Notes

The fix keeps the operational guardrail that large deltas should be chunked. The 5000 value now controls batch size, so a 300K-item delta is processed as multiple batches rather than one giant request or a truncated first slice.

Validation

  • BTAA_SKIP_TEST_DB=true python -m pytest backend/tests/services/test_bridge_search_index.py backend/tests/services/test_bridge_cache_refresh.py backend/tests/services/test_bridge_sync_report.py
  • ruff format --check backend/app/services/bridge_sync/search_index.py backend/app/services/bridge_sync/cache_refresh.py backend/app/services/bridge_sync/report.py backend/tests/services/test_bridge_search_index.py backend/tests/services/test_bridge_cache_refresh.py backend/tests/services/test_bridge_sync_report.py
  • ruff check backend/app/services/bridge_sync/search_index.py backend/app/services/bridge_sync/cache_refresh.py backend/app/services/bridge_sync/report.py backend/tests/services/test_bridge_search_index.py backend/tests/services/test_bridge_cache_refresh.py backend/tests/services/test_bridge_sync_report.py
  • git diff --check

Refs #145

@ewlarson ewlarson marked this pull request as ready for review May 27, 2026 16:37
@ewlarson ewlarson marked this pull request as draft May 27, 2026 16:51
@ewlarson ewlarson changed the title [codex] Remove bridge sync indexing cap Remove bridge sync indexing cap May 27, 2026
@ewlarson ewlarson marked this pull request as ready for review May 27, 2026 17:18
@ewlarson ewlarson merged commit f13f00f into develop May 27, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant