fix: Eliminate binary image cache lock contention during SDK startup by denrase · Pull Request #7766 · getsentry/sentry-cocoa

denrase · 2026-03-31T12:10:57Z

📜 Description

The PR aims to do smaller and low risk changes to the binary image cache.

sentrycrashbic_registerAddedCallback held binaryImagesMutex for the entire iteration of all cached images (~700+). Any concurrent binaryImageAdded call — triggered by dlopen on another thread — blocked on the mutex for the full duration, causing fatal app hangs of 1–2+ seconds on cold start.

Also, since we do bulk insertion at startup, insert images with constant time and lazily sort.

Changes

SentryCrashBinaryImageCache.c:
- Snapshot tailNode under the mutex, iterate outside it. Nodes are never freed, so the snapshot pointer should remain valid.
- Make callbacks _Atomic with acquire/release semantics.
Swift layer (SentryBinaryImageCache.swift):
- Move sentrycrashbic_register* calls outside lock.synchronized in start() / stop().
- Downgrade NSRecursiveLock to NSLock (re-entrant lock no longer needed).

Result

In the added repro, thread B blocked time during registerAddedCallback went from 148ms → 0.27ms.

💡 Motivation and Context

Relates to #4618

💚 How did you test it?

📝 Checklist

You have to check all boxes before merging:

I added tests to verify the changes.
No new PII added or SDK only sends newly added PII if sendDefaultPII is enabled.
I updated the docs if needed.
I updated the wizard if needed.
Review from the native team if needed.
No breaking change or entry added to the changelog.
No breaking change for hybrid SDKs or communicated to hybrid SDKs.

sentrycrashbic_registerAddedCallback held binaryImagesMutex for the entire iteration of all cached images (~700+). Any concurrent binaryImageAdded call — triggered by dlopen on another thread — blocked on the mutex for the full duration, causing fatal app hangs of 1–2+ seconds on cold start. ### Changes C layer (SentryCrashBinaryImageCache.c): - Snapshot tailNode under the mutex (nanoseconds), iterate outside it. Nodes are never freed, so the snapshot pointer remains valid. - Make tailNode _Atomic — eliminates the first mutex round-trip in binaryImageAdded and fixes the formal data race in iterateOverImages (crash handler path). - Make imageAddedCallback / imageRemovedCallback _Atomic with acquire/release semantics — fixes TSan-confirmed data race on callback function pointers. Swift layer (SentryBinaryImageCache.swift): - Move sentrycrashbic_register* calls outside lock.synchronized in start() / stop(). - Downgrade NSRecursiveLock to NSLock (re-entrant lock no longer needed). - Read isDebug under the lock — fixes data race. ### Result Thread B blocked time during registerAddedCallback: 148ms → 0.27ms (551× improvement).

github-actions · 2026-03-31T12:11:22Z

Semver Impact of This PR

🟢 Patch (bug fixes)

📋 Changelog Preview

This is how your changes will appear in the changelog.
Entries from this PR are highlighted with a left border (blockquote style).

New Features ✨

Add per-call attachAllThreads override by itaybre in #7767
Add attachAllThreads option by itaybre in #7764

Bug Fixes 🐛

Eliminate binary image cache lock contention during SDK startup by denrase in #7766

Move SessionTracker file I/O off the main thread by denrase in #7704
Copy incoming tags dict to prevent crash by itaybre in #7763
Per-instance unmaskView propagates to child views by denrase in #7733

Documentation 📚

Add SentryCrash analysis and improvement plan document by itaybre in #7528

Internal Changes 🔧

Deps

Bump ruby/setup-ruby from 1.295.0 to 1.298.0 by dependabot in #7751
Bump codecov/codecov-action from 5.5.3 to 6.0.0 by dependabot in #7749
Bump getsentry/craft from 2.25.0 to 2.25.2 by dependabot in #7748
Bump fastlane-plugin-sentry from 2.4.0 to 2.5.0 by dependabot in #7746
Bump actions/deploy-pages from 4.0.5 to 5.0.0 by dependabot in #7747
Update clang-format version by sentry-mobile-updater in #7755
Bump getsentry/craft/.github/workflows/changelog-preview.yml from 2.25.0 to 2.25.2 by dependabot in #7750
Bump activesupport from 7.2.3 to 7.2.3.1 by dependabot in #7732
Bump fastlane-plugin-sentry from 2.3.0 to 2.4.0 by dependabot in #7724
Bump ruby/setup-ruby from 1.292.0 to 1.295.0 by dependabot in #7728
Bump getsentry/craft from 2.24.1 to 2.25.0 by dependabot in #7729
Bump codecov/codecov-action from 5.5.2 to 5.5.3 by dependabot in #7725
Bump actions/create-github-app-token from 2.2.1 to 3.0.0 by dependabot in #7726
Bump getsentry/craft/.github/workflows/changelog-preview.yml from 2.24.1 to 2.25.0 by dependabot in #7727
Bump json from 2.18.1 to 2.19.2 by dependabot in #7709

Other

Update validate-pr workflow by stephanie-anderson in #7769
Pin GitHub Actions to full-length commit SHAs by joshuarli in #7730

Other

impr: Align app lifecycle breadcrumb states with app context by denrase in #7703

_{🤖 This preview updates automatically when you update the PR.}

codecov · 2026-03-31T12:24:06Z

Codecov Report

❌ Patch coverage is 93.33333% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.133%. Comparing base (2b71685) to head (4f2cffe).
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
...ces/Swift/Core/Helper/SentryBinaryImageCache.swift	88.571%	4 Missing ⚠️
...entryCrash/Recording/SentryCrashBinaryImageCache.c	97.500%	1 Missing ⚠️

Additional details and impacted files

@@              Coverage Diff              @@
##              main     #7766       +/-   ##
=============================================
- Coverage   85.412%   85.133%   -0.280%     
=============================================
  Files          487       487               
  Lines        29086     29105       +19     
  Branches     12592     12593        +1     
=============================================
- Hits         24843     24778       -65     
- Misses        4193      4275       +82     
- Partials        50        52        +2

Files with missing lines	Coverage Δ
...entryCrash/Recording/SentryCrashBinaryImageCache.c	`96.261% <97.500%> (+0.516%)`	⬆️
...ces/Swift/Core/Helper/SentryBinaryImageCache.swift	`96.363% <88.571%> (+0.209%)`	⬆️

... and 16 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2b71685...4f2cffe. Read the comment docs.

During SDK startup, hundreds of binary images are loaded and inserted into the cache before any lookup occurs. The previous implementation sorted on every insert — O(n) per insert due to Array.insert(at:), making startup O(n²). This change appends in O(1) and defers sorting to the first lookup, reducing total startup cost from ~180ms to ~5ms for 5000 images.

Sources/SentryCrash/Recording/SentryCrashBinaryImageCache.c

itaybre

Looks promising, but got some questions: mostly worried about sentrycrashbic_stopCache clearing the cache

Sources/SentryCrash/Recording/SentryCrashBinaryImageCache.c

itaybre · 2026-03-31T18:00:08Z

Sources/SentryCrash/Recording/SentryCrashBinaryImageCache.c

 static void
 binaryImageRemoved(const struct mach_header *header, intptr_t slide)
 {
    SentryCrashBinaryImageNode *nextNode = &rootNode;


m: Not new, but should we worry about sentrycrashbic_stopCache being called while we iterate this?

Moved the free to the next start cycle in order to not iterate while being freed. Technically still a possibility with rapid stop/start SDK calls, but this should not be a realistic case. Please comment if you have a different solution in mind or if this is good enough for now.

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-04-03T09:11:22Z

Sources/SentryCrash/Recording/SentryCrashBinaryImageCache.c

+        SentryCrashBinaryImageNode *next = pending->next;
+        free(pending);
+        pending = next;
+    }


Use-after-free when startCache frees nodes during iteration

Medium Severity

The registerAddedCallback iteration now runs outside the mutex, while startCache frees pendingFreeList also outside the mutex. If stopCache defers nodes, then startCache is called on another thread (freeing those nodes), while the iteration is still traversing them, this is a use-after-free. Critically, startCache sets tailNode back to non-NULL, so the iteration's while loop condition (tailNode != NULL) passes again after the restart, causing it to continue following freed node->next pointers. The comment "Defer freeing — readers outside the mutex may still be traversing" indicates awareness of readers, but startCache doesn't respect that invariant.

Additional Locations (1)

Sources/SentryCrash/Recording/SentryCrashBinaryImageCache.c#L262-L273

cursor · 2026-04-03T09:11:22Z

Sources/Swift/Core/Helper/SentryBinaryImageCache.swift

+        sentrycrashbic_registerAddedCallback(nil)
+        sentrycrashbic_registerRemovedCallback(nil)
        lock.synchronized {
-            sentrycrashbic_registerAddedCallback(nil)


Non-atomic start/stop creates callback-cache state mismatch

Medium Severity

Splitting callback registration out of lock.synchronized in start() and stop() creates race windows. If stop() runs between start()'s lock release and callback registration, the callbacks get registered after stop() already cleared cache to nil. The callbacks remain active but guard self.cache != nil causes all images to be silently dropped, leaving a permanently empty cache. The previous code was atomic—both operations were inside one lock.synchronized block. Since SentrySDK.close() can be called from any thread, this is reachable in production.

Additional Locations (1)

Sources/Swift/Core/Helper/SentryBinaryImageCache.swift#L62-L70

denrase added 3 commits March 31, 2026 15:10

bail out of iteration if stop sets tail node to null

ecf864b

add cl entry

67de31c

denrase marked this pull request as ready for review March 31, 2026 13:48

denrase requested review from itaybre, noahsmartin, philipphofmann and philprime as code owners March 31, 2026 13:48

sentry bot reviewed Mar 31, 2026

View reviewed changes

Sources/SentryCrash/Recording/SentryCrashBinaryImageCache.c Show resolved Hide resolved

cursor bot reviewed Mar 31, 2026

View reviewed changes

Sources/SentryCrash/Recording/SentryCrashBinaryImageCache.c Show resolved Hide resolved

Sources/SentryCrash/Recording/SentryCrashBinaryImageCache.c Show resolved Hide resolved

itaybre reviewed Mar 31, 2026

View reviewed changes

denrase added 3 commits April 3, 2026 10:05

Merge branch 'main' into fix/binary-image-cache-contention

6a7b9b9

review feecback

b7b092a

upodate comment

4f2cffe

denrase requested a review from itaybre April 3, 2026 09:10

cursor bot reviewed Apr 3, 2026

View reviewed changes

Uh oh!

Conversation

denrase commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📜 Description

Changes

Result

💡 Motivation and Context

💚 How did you test it?

📝 Checklist

Uh oh!

github-actions bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Semver Impact of This PR

New Features ✨

Bug Fixes 🐛

Documentation 📚

Internal Changes 🔧

Deps

Other

Other

Uh oh!

codecov bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

itaybre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

itaybre Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

denrase Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Apr 3, 2026

Choose a reason for hiding this comment

Use-after-free when startCache frees nodes during iteration

Uh oh!

cursor bot Apr 3, 2026

Choose a reason for hiding this comment

Non-atomic start/stop creates callback-cache state mismatch

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

denrase commented Mar 31, 2026 •

edited

Loading

github-actions bot commented Mar 31, 2026 •

edited

Loading

codecov bot commented Mar 31, 2026 •

edited

Loading