Claude/finish query worker limits bkwxo3#110
Merged
Merged
Conversation
Workers are killed mid-task on SIGTERM today (heartbeat writes its marker and exits immediately), so Kubernetes rollouts/scale-downs/node-drains drop in-flight work that only Redis reclaim recovers. Make the shared task path a good shutdown citizen and let ops tune concurrency per Deployment. shepherd_utils/shared.py: - install_shutdown_handlers(): asyncio-aware SIGTERM/SIGINT handlers that set a shutdown flag (loop.add_signal_handler, with a signal.signal fallback). - get_tasks() stops pulling new work on shutdown and drains in-flight tasks by acquiring all concurrency-semaphore permits (every worker already releases its permit when a task finishes, so this needs no per-worker changes), bounded by worker_drain_timeout_sec, then exits 0. Stragglers fall to reclaim. - _resolve_task_limit(): TASK_LIMIT env var overrides a worker's in-code default; each worker is its own container so one env per Deployment is unambiguous. No behavior change unless set. shepherd_utils/heartbeat.py: - manage_signals flag so get_tasks owns shutdown instead of the immediate-exit signal handler; mark_clean_shutdown() writes the marker from the loop so the monitor still classifies the exit as a clean scale-down, not a crash. shepherd_utils/reclaim.py: - finish_query idle floor 240s. Its async callback retries can run for minutes; at the 30s default a second consumer could XCLAIM mid-callback and deliver it twice. config.py: worker_drain_timeout_sec (default 30s). README + tests for the env override, drain/exit, clean marker, and shutdown. https://claude.ai/code/session_019ZsKWm2SqKkGqvNqNBjfaU
… POST The async callback built `payload` from the decompressed message but left `message_bytes` referenced in scope, so both full copies stayed resident for the entire (up to 120s x retries) POST -- doubling peak memory per in-flight task under load. Rebind `message_bytes` to the spliced result instead so the original buffer is freed as soon as the new one is built; only one copy is held during the POST. Wire format is unchanged (still a single Content-Length body). https://claude.ai/code/session_019ZsKWm2SqKkGqvNqNBjfaU
…k holding on to too many full callback messages when trying to send them back
Codecov Report❌ Patch coverage is
... and 6 files with indirect coverage changes Continue to review full report in Codecov by Harness.
🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.