Skip to content

Commit 9b31529

Browse files
committed
Fix wait_for_workers to allow retry workers to finish before reporting
When Buildkite retries a job, the main queue is already exhausted from the original run. A retry worker may find unresolved failures via the error-reports fallback and start re-running them via the Retry queue. But those tests are not in the Redis running set, so active_workers? returns false and the summary's wait_for_workers loop exits immediately on exhausted? — canceling the retry worker before it can clear the error-report. Fix: after the main loop exits due to exhausted?, if this is a retry run (BUILDKITE_RETRY_COUNT > 0) and error-reports are still non-empty, wait up to inactive_workers_timeout for retry workers to clear them.
1 parent 637d7b7 commit 9b31529

2 files changed

Lines changed: 22 additions & 0 deletions

File tree

ruby/lib/ci/queue/configuration.rb

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,11 @@ def lazy_load_test_helper_paths
9999
@lazy_load_test_helpers.split(',').map(&:strip)
100100
end
101101

102+
def retry?
103+
ENV.fetch("BUILDKITE_RETRY_COUNT", "0").to_i > 0 ||
104+
ENV["SEMAPHORE_PIPELINE_RERUN"] == "true"
105+
end
106+
102107
def queue_init_timeout
103108
@queue_init_timeout || timeout
104109
end

ruby/lib/ci/queue/redis/supervisor.rb

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,23 @@ def wait_for_workers
3939
yield if block_given?
4040
end
4141

42+
# On retry runs (BUILDKITE_RETRY_COUNT > 0), the main queue is already
43+
# exhausted from the original run. A retry worker may have found unresolved
44+
# failures via the error-reports fallback and be running them via the Retry
45+
# queue — but those tests are NOT in the Redis running set so active_workers?
46+
# returns false and the loop above exits immediately.
47+
#
48+
# Wait up to inactive_workers_timeout for retry workers to clear error-reports.
49+
# This prevents the summary from canceling retry workers before they finish.
50+
if exhausted? && config.retry? && !rescue_connection_errors { build.failed_tests }.empty?
51+
@time_left_with_no_workers = config.inactive_workers_timeout
52+
until rescue_connection_errors { build.failed_tests }.empty? ||
53+
@time_left_with_no_workers <= 0
54+
sleep 1
55+
@time_left_with_no_workers -= 1
56+
end
57+
end
58+
4259
exhausted?
4360
rescue CI::Queue::Redis::LostMaster
4461
false

0 commit comments

Comments
 (0)