Skip to content

Commit 1a35018

Browse files
yarikopticclaude
andcommitted
BF: Seek file data to 0 before every retry in request(), not just for retryable status codes
Previously, data.seek(0) was only called inside the retry_statuses/retry_if branch, which handles specific HTTP response codes (429, 500-504, and zarr-specific conditions). When a ConnectionError occurred mid-upload (e.g., connection dropped during a multi-minute large zarr chunk transfer), tenacity caught the exception and retried, but the file pointer was left at whatever position the read was interrupted. On retry, requests computed Content-Length as (file_size - current_position) and sent only the tail of the file. S3 received partial data whose MD5 didn't match the Content-MD5 header (computed from the full file), resulting in a BadDigest 400 error. This error was also not retried (not in RETRY_STATUSES, not matched by retry_if), so the upload failed permanently. The fix uses tenacity's before_sleep callback to seek the file data back to position 0 before every retry, regardless of whether the retry was triggered by ConnectionError, HTTPError, or a retryable status code. Evidence from two independent logs (issue #1821): - Linux: 188 level-0 zarr chunks failed (large, ~6 min upload each), 67 level-2 chunks succeeded (small, fast upload) -- all with "succeeded after 1 retry" + BadDigest - Windows local disk: 187 level-0 failures after 2-8 retries, 68 level-1 successes -- ruling out NFS/filesystem as the cause - Both logs show "Resetting dropped connection: dandiarchive.s3.amazonaws.com" confirming ConnectionErrors preceded the BadDigest errors Closes #1821 Co-Authored-By: Claude Code 2.1.81 / Claude Opus 4.6 <noreply@anthropic.com>
1 parent 5f03d9b commit 1a35018

1 file changed

Lines changed: 9 additions & 2 deletions

File tree

dandi/dandiapi.py

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -212,6 +212,14 @@ def request(
212212

213213
lgr.debug("%s %s", method.upper(), url)
214214

215+
def _rewind_data(retry_state: tenacity.RetryCallState) -> None:
216+
# After a failed attempt (ConnectionError mid-upload, HTTPError,
217+
# etc.), the file pointer may be at an arbitrary position. Seek
218+
# back to 0 so the next attempt sends the complete body.
219+
# See https://github.com/dandi/dandi-cli/issues/1821
220+
if data is not None and hasattr(data, "seek"):
221+
data.seek(0)
222+
215223
try:
216224
for i, attempt in enumerate(
217225
tenacity.Retrying(
@@ -225,6 +233,7 @@ def request(
225233
),
226234
stop=tenacity.stop_after_attempt(REQUEST_RETRIES),
227235
reraise=True,
236+
before_sleep=_rewind_data,
228237
)
229238
):
230239
with attempt:
@@ -249,8 +258,6 @@ def request(
249258
url,
250259
result.text,
251260
)
252-
if data is not None and hasattr(data, "seek"):
253-
data.seek(0)
254261
if retry_after := get_retry_after(result):
255262
lgr.debug(
256263
"Sleeping for %d seconds as instructed in response "

0 commit comments

Comments
 (0)