Recover from stale temp file left by crashed writer#11
Open
mcfnord wants to merge 1 commit intosoftins:masterfrom
Open
Recover from stale temp file left by crashed writer#11mcfnord wants to merge 1 commit intosoftins:masterfrom
mcfnord wants to merge 1 commit intosoftins:masterfrom
Conversation
If the PHP process writing new cache data crashes (e.g. connection reset by peer), register_shutdown_function may not run, leaving a zero-byte .tmp lock file behind. All subsequent requests then loop waiting for a cache update that never arrives, timing out after 20s (or sooner if the caller disconnects). Detect this by checking whether the .tmp file is older than 30 seconds (well past the ~1.5s a normal fetch takes) and removing it so the next waiter can take the lock and complete the fetch. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
If the PHP process writing new cache data is killed mid-flight (e.g. a connection reset at the PHP-FPM socket level),
register_shutdown_function('cleanup')does not run. This leaves a zero-byte.tmplock file behind.All subsequent requests for that endpoint then enter the acquisition loop, fail to open the
.tmpfile exclusively, and loop in 200ms sleeps until the 20-second timeout fires — returning a non-JSONdie()response. Any caller with a shorter timeout (e.g. 10s) sees a 499/503 instead. The endpoint appears permanently broken until the stale file is manually removed.Fix
After a failed
fopen(..., 'x'), check whether the.tmpfile is older than 30 seconds. A normal successful fetch completes in well under 2 seconds, so a 30-second threshold only fires on genuinely abandoned locks. When detected, log the event, delete the file, andcontinue— the next loop iteration acquires the lock cleanly and performs a fresh fetch.Test plan
.tmpfile on next request (verified in production: log emitted "Stale lock file detected", fresh data returned in ~1250ms)fopenfails, and only when file is >30s old)🤖 Generated with Claude Code