Replace time based logic with atomic rename for write#2897
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR changes disk-buffering writer/reader coordination to use a “stage then rename-on-close” workflow, reducing reliance on time-based read delays and improving crash recovery for in-flight writes.
Changes:
- Writer now appends to
<timestamp>.tmpand promotes to the final numeric filename on close. - Folder startup recovers orphan
*.tmpfiles by promoting them to final names when safe. - Configuration/docs updated to make
minFileAgeForReadMillisoptional (default0) and update tests accordingly.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| disk-buffering/src/main/java/io/opentelemetry/contrib/disk/buffering/internal/storage/files/WritableFile.java | Implements staging + rename-on-close promotion behavior. |
| disk-buffering/src/main/java/io/opentelemetry/contrib/disk/buffering/internal/storage/FolderManager.java | Recovers orphan temp files and allows reader to close expired writer to surface data. |
| disk-buffering/src/main/java/io/opentelemetry/contrib/disk/buffering/storage/impl/FileStorageConfiguration.java | Updates defaults/docs and removes min-read-age vs max-write-age validation. |
| disk-buffering/src/test/java/io/opentelemetry/contrib/disk/buffering/internal/storage/files/WritableFileTest.java | Adds/updates tests for staged writes and promotion-on-close behavior. |
| disk-buffering/src/test/java/io/opentelemetry/contrib/disk/buffering/internal/storage/StorageTest.java | Updates expectations to account for visible .tmp files while writing. |
| disk-buffering/src/test/java/io/opentelemetry/contrib/disk/buffering/internal/storage/FolderManagerTest.java | Adds coverage for orphan *.tmp recovery and adjusts filename expectations. |
| disk-buffering/README.md | Updates user-facing explanation for rename-on-close synchronization. |
| disk-buffering/DESIGN.md | Documents the new synchronization approach and crash recovery. |
| CHANGELOG.md | Notes behavioral/config changes for the release notes. |
| public synchronized void close() throws IOException { | ||
| if (isClosed.compareAndSet(false, true)) { | ||
| out.close(); | ||
| if (!staging.renameTo(destination)) { | ||
| throw new IOException("Could not rename " + staging + " to " + destination); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
@mamazzol I think we should ignore this recommendation, as it caused the tests to fail due to the method File#toPath() not being supported by Android. We have an Android support check to make sure we don't break this tool for Android apps, which usually means we cannot use some of the Java 8 APIs, such as this one.
| public synchronized void close() throws IOException { | ||
| if (isClosed.compareAndSet(false, true)) { | ||
| out.close(); | ||
| if (!staging.renameTo(destination)) { | ||
| throw new IOException("Could not rename " + staging + " to " + destination); | ||
| } | ||
| } | ||
| } |
| private void recoverOrphanTempFiles() { | ||
| File[] existingFiles = folder.listFiles(); | ||
| if (existingFiles == null) { | ||
| return; | ||
| } | ||
| for (File file : existingFiles) { | ||
| String name = file.getName(); | ||
| if (!name.endsWith(STAGING_SUFFIX)) { | ||
| continue; | ||
| } | ||
| File target = new File(folder, name.substring(0, name.length() - STAGING_SUFFIX.length())); | ||
| if (!NUMBER_PATTERN.matcher(target.getName()).matches()) { | ||
| continue; | ||
| } | ||
| if (target.exists()) { | ||
| if (!file.delete()) { | ||
| logger.warning("Could not delete duplicate orphan temp file: '" + file.getName() + "'"); | ||
| } | ||
| continue; | ||
| } | ||
| if (file.renameTo(target)) { | ||
| logger.fine( | ||
| "Recovered orphan temp file: '" + file.getName() + "' -> '" + target.getName() + "'"); | ||
| } else { | ||
| logger.warning("Could not promote orphan temp file: '" + file.getName() + "'"); | ||
| } | ||
| } | ||
| } |
| /* | ||
| * If the current writable file has expired, close it and return true. | ||
| * This allows to have a readeable file without waiting for the next write to trigger the check. | ||
| */ |
| public final FileStorageConfiguration build() { | ||
| FileStorageConfiguration configuration = autoBuild(); | ||
| if (configuration.getMinFileAgeForReadMillis() | ||
| <= configuration.getMaxFileAgeForWriteMillis()) { | ||
| throw new IllegalArgumentException( | ||
| "The configured max file age for writing must be lower than the configured min file age for reading"); | ||
| } | ||
| return configuration; | ||
| return autoBuild(); | ||
| } |
| The writing and reading processes can run in parallel because they target disjoint files: the | ||
| writer appends to `<timestamp>.tmp` and only files whose name is *entirely* numeric are visible to | ||
| the reader (the reader applies `Matcher.matches()` on `\d+`, so the trailing `.tmp` excludes | ||
| in-flight files). Each rollover atomically renames the temp file to its final name via | ||
| `rename(2)`, so the reader can never observe a partially-written file. If the reader is invoked |
LikeTheSalad
left a comment
There was a problem hiding this comment.
Thank you, @mamazzol 🙏
Please take a look at Copilot's suggestions because I think those are valid points. But apart from those, I think it's good to go.
breedx-splk
left a comment
There was a problem hiding this comment.
I haven't fully reviewed this yet, but I wanted to ask the same question I asked in the issue: What happens to the staging files when the process exits and restarts? Will the staged files be moved over or ?
Ok, I see it now, it does look like there is explicit code to cover this case and recover the "orphaned" files. Killer! Thanks. |
Apologies, I missed the original question! Happy to see you liked the approach :) |
|
Hello @breedx-splk is anything blocking this PR? Apologies for the ping, just checking! |
|
I think @breedx-splk is OOO for a while. Can anyone else take a look? Or should we wait until @breedx-splk is back? @open-telemetry/java-contrib-approvers |
Description:
Feature addition: This PR replaces the time-based separation with an atomic rename
As a consequence,
minFileAgeForReadMillisis no longer required for correctness (default is now 0) and the build-time validation minFileAgeForRead > maxFileAgeForWrite has been dropped.Existing Issue(s):
#2889
Testing:
./gradlew :disk-buffering:checkis green.Documentation: