Fix JNI string lifetime/null-safety, Windows cancel-drain race, and UWP version parse#1494
Fix JNI string lifetime/null-safety, Windows cancel-drain race, and UWP version parse#1494bmehta001 wants to merge 9 commits into
Conversation
…WP parse) Bundled small fixes from a repo-wide review. None are recent regressions (introduced 2018-2023). 1) Signals_jni.cpp (UAF): sendSignal called ReleaseStringUTFChars on the UTF buffer *before* passing it to Signals::CreateEventProperties (which copies from it by value), reading freed/unpinned memory. Move the release to after the buffer is consumed. 2) HttpClient_WinInet.cpp / HttpClient_WinRt.cpp (data race): the CancelAllRequests() drain loop read `m_requests.empty()` with no lock held, while erase() mutates the map on the HTTP callback / PPL continuation thread under m_requestsMutex -- a data race / UB on std::map. Read empty() under the lock each iteration, mirroring the already-correct WinRt destructor drain. (The recent microsoft#1460 fixed the destructor's drain but not these methods.) 3) JniConvertors.cpp / OfflineStorage_Room.cpp (null deref): the result of GetStringUTFChars (which returns null on allocation failure) was fed straight into a std::string ctor. Null-check before constructing. 4) WindowsRuntimeSystemInformationImpl.cpp (UWP): std::stoull on the DeviceFamilyVersion string was unguarded; guard it with try/catch defaulting to 0 (the code already has a versionDec==0 -> "10.0" fallback). Validation: JniConvertors.cpp and OfflineStorage_Room.cpp pass NDK aarch64 -fsyntax-only. Signals_jni (needs the private signals module), the Windows HTTP clients, and the UWP path can't be built on this host and rely on the Android/Windows CI; the WinInet/WinRt fix mirrors the existing correct destructor pattern in the same files. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
lib/jni/Signals_jni.cpp (sendSignal): guard the jstring argument and the GetStringUTFChars() result for null before use. A null return (e.g. OOM, with a pending exception) previously flowed into CreateEventProperties()/Release as a null pointer. Now returns false instead. lib/offline/OfflineStorage_Room.cpp (ReleaseRecords): a failed tenant-token string read was turned into an empty std::string and reported as dropped[""] = count, silently misattributing dropped records to an empty tenant token. Now follows the file's established pattern (GetStringUTFChars + ThrowRuntime) and skips the entry when the read fails instead of fabricating an empty token. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR bundles several small safety fixes across the Android JNI layer and Windows-specific implementations to eliminate latent undefined behavior (use-after-free / data race), improve null-safety around JNI string conversion, and harden UWP OS version parsing against malformed input.
Changes:
- Fix JNI string lifetime in
Signals_sendSignaland add null checks forGetStringUTFCharsfailures before constructingstd::string. - Remove a Windows HTTP cancel/drain data race by checking
m_requests.empty()under the same mutex used byerase(). - Guard UWP
DeviceFamilyVersionparsing (std::stoull) with exception handling and preserve the existingversionDec == 0fallback behavior.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| lib/jni/Signals_jni.cpp | Prevents using a UTF buffer after it has been released; adds null-safety for null jstring / failed UTF conversion. |
| lib/jni/JniConvertors.cpp | Avoids constructing std::string from a null UTF pointer when GetStringUTFChars fails. |
| lib/offline/OfflineStorage_Room.cpp | Adds a null check before converting a UTF pointer to std::string, preventing a null deref / misattribution. |
| lib/http/HttpClient_WinRt.cpp | Fixes a drain-loop data race by reading m_requests.empty() under m_requestsMutex. |
| lib/http/HttpClient_WinInet.cpp | Fixes a drain-loop data race by reading m_requests.empty() under m_requestsMutex. |
| lib/pal/universal/WindowsRuntimeSystemInformationImpl.cpp | Hardens UWP version parsing by catching std::stoull exceptions and falling back to 0. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Addresses @lalitb's review: Signals_jni nativeInitialize called strlen(convertedValue) (and ReleaseStringUTFChars) on the result of GetStringUTFChars(base_url, ...) with no null check, so a failed conversion (e.g. pending OOM) would dereference null. Guard it: only read/release when non-null; an empty/failed base_url keeps the existing default (BaseUrl left unset). While here, make the JNI string null-safety consistent across the file this PR already hardens. ThrowRuntime/ThrowLogic only throw when s_throwExceptions is true; otherwise they ExceptionClear() and execution continues, so the existing checks are not sufficient on their own. The other GetStringUTFChars sites in OfflineStorage_Room.cpp still used the pointer unguarded: - GetReservedRecords: token_utf passed straight into StorageRecord and ReleaseStringUTFChars (null -> std::string(nullptr) UB / release of null). - GetSetting: result = utf with no null check. - GetRecords: tenant_utf passed into emplace_back / released unguarded. Each now mirrors the guard already added for the dropped-records path: substitute an empty token (matching JniConvertors' canonical return "" on null) and only release when non-null. This avoids null deref and release-of-null without silently changing behavior on success. Validated with NDK r29 clang -fsyntax-only (aarch64-linux-android23, -std=c++14, JNI build defines) on both files: clean. Files changed: - lib/jni/Signals_jni.cpp: null-guard base_url GetStringUTFChars - lib/offline/OfflineStorage_Room.cpp: null-guard token_utf, result, tenant_utf Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| jboolean isCopy = true; | ||
| const char *convertedValue = (env)->GetStringUTFChars(base_url, &isCopy); | ||
| if (strlen(convertedValue) > 0) { | ||
| config.ServiceRequestConfig.BaseUrl = convertedValue; | ||
| if (convertedValue != nullptr) { |
| records.emplace_back( | ||
| std::to_string(id_j), | ||
| tenant_utf, | ||
| tenant_utf != nullptr ? tenant_utf : "", | ||
| latency, | ||
| persistence, |
…ot review) Unlike the other JNI string reads in this file, the GetRecords() path called env->GetStringUTFChars(tenant_j, ...) with neither a pre-call null check on the jstring nor a following ThrowRuntime() exception check. Passing a null jstring to GetStringUTFChars is undefined per the JNI spec (and can leave a pending NullPointerException in flight). Only call GetStringUTFChars when tenant_j is non-null; the downstream already tolerates a null tenant_utf (defaults to "" and skips the guarded ReleaseStringUTFChars). Validated: NDK 27 clang++ --target=aarch64-linux-android24 -fsyntax-only compiles OfflineStorage_Room.cpp (USE_ROOM) cleanly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| jboolean isCopy = true; | ||
| const char *convertedValue = (env)->GetStringUTFChars(base_url, &isCopy); | ||
| if (strlen(convertedValue) > 0) { | ||
| config.ServiceRequestConfig.BaseUrl = convertedValue; | ||
| if (convertedValue != nullptr) { | ||
| if (strlen(convertedValue) > 0) { | ||
| config.ServiceRequestConfig.BaseUrl = convertedValue; | ||
| } | ||
| env->ReleaseStringUTFChars(base_url, convertedValue); |
| auto utf = env->GetStringUTFChars(token, nullptr); | ||
| std::string key(utf); | ||
| env->ReleaseStringUTFChars(token, utf); | ||
| dropped[key] = static_cast<size_t>(count); | ||
| ThrowRuntime(env, "Exception fetching token string"); | ||
| // Skip rather than misattribute dropped records to an empty | ||
| // tenant token when the string read fails. | ||
| if (utf != nullptr) |
There was a problem hiding this comment.
The pending-exception part of this is addressed in 25df775: GetRecords now calls ThrowRuntime right after the tenant-token read, which describes+clears the exception, notifies the observer, and unwinds the batch through the surrounding try/catch (consistent with GetAndReserveRecords and the rest of the file).
I'm intentionally not changing the empty-tenant-token behavior here. Defaulting an unreadable/absent tenant token to """" is the SDK's long-standing behavior in this path (it predates this PR, whose scope is JNI null-safety/crash hardening), and the records do carry a real record_id, so releaseUnconsumed() accounting is driven by index/consumption, not by the tenant string. Reworking the reserve/release batch semantics to abort-and-release on a corrupt tenant token is a separate behavioral change that deserves its own PR and validation rather than being folded into a null-safety fix.
Copilot's re-review flagged two more JNI string reads where the result was null-checked but the jstring input was passed to GetStringUTFChars unguarded, which crashes (or leaves a pending exception) before the result check when the jstring is null: - Signals_jni.cpp nativeInitialize: only read base_url when non-null; if the read itself fails (e.g. OOM) return immediately rather than continuing JNI calls with an exception pending, matching the nativeLog(signal_item_json) site. - OfflineStorage_Room.cpp ByTenant releaseRecords (token) and GetRecords-by-tenant (tenantToken_java): only call GetStringUTFChars when the jstring is non-null; the downstream already tolerates a null result (skips/defaults). The getSetting site is already guarded by an enclosing if (java_value). Validated: NDK 27 clang++ --target=aarch64-linux-android24 -fsyntax-only compiles both files cleanly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| auto token_utf = (tenantToken_java != nullptr) | ||
| ? env->GetStringUTFChars(tenantToken_java, nullptr) | ||
| : nullptr; | ||
| ThrowRuntime(env, "string tenant"); | ||
| auto latency = static_cast<EventLatency>(std::max(latency_lb, |
There was a problem hiding this comment.
The pending-exception half of this is fixed in 25df775 (GetRecords now calls ThrowRuntime after the tenant-token read; the same is already done in GetAndReserveRecords at the existing ThrowRuntime(env, "string tenant") right after its read, which describes+clears the exception, notifies the observer, and unwinds via the surrounding try/catch).
I'm intentionally not changing the empty-tenant-token fallback. Defaulting an unreadable/absent tenant token to """" is the SDK's long-standing behavior in this path and predates this PR (whose scope is JNI null-safety/crash hardening). Records still carry a real record_id, and releaseUnconsumed() accounting is driven by index/consumption rather than the tenant string. Reworking the reserve/release batch semantics to abort-and-release on a corrupt tenant token is a separate behavioral change that should get its own PR and validation rather than riding along with a null-safety fix.
| const char* tenant_utf = (tenant_j != nullptr) | ||
| ? env->GetStringUTFChars(tenant_j, nullptr) | ||
| : nullptr; |
…lot review) If GetStringUTFChars(tenant_j) fails (e.g. OOM) it returns null and leaves a pending JNI exception. GetRecords() previously continued issuing Get*Field calls with that exception in flight, which can make them return defaults. Call ThrowRuntime after the read (as GetAndReserveRecords and the rest of this file already do) to describe+clear the exception, notify the observer, and unwind the batch via the surrounding try/catch. Validated: NDK 27 clang++ -fsyntax-only compiles cleanly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| size_t jstr_length = env->GetStringUTFLength(jstr); | ||
| auto jstr_utf = env->GetStringUTFChars(jstr, nullptr); | ||
| if (jstr_utf == nullptr) | ||
| return ""; |
There was a problem hiding this comment.
Addressed in 1274680 by clearing the pending exception before returning, so callers no longer continue with an exception in flight (the actual hazard you describe). I kept the std::string return contract rather than switching JStringToStdString to std::optional/out-param: it's called from many sites, an OOM mid-conversion is the only failure mode here, and the empty-string-on-failure behavior is the established convention. Reworking the signature and every call site is a broader refactor that's out of scope for this null-safety hardening PR.
…ot review) When GetStringUTFChars returns null (e.g. OOM) it leaves a pending Java exception. JStringToStdString returned "" without clearing it, so callers kept making JNI calls with an exception in flight. Clear the pending exception before returning, consistent with the ExceptionClear handling elsewhere in the JNI layer. Keeps the existing string-returning contract rather than redesigning the helper's signature, so no call sites change. Validated: NDK 27 clang++ -fsyntax-only compiles cleanly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| if (signal_item_json == nullptr) { | ||
| return false; | ||
| } | ||
| const char *signalItemJson = (env)->GetStringUTFChars(signal_item_json, &isCopy); | ||
| env->ReleaseStringUTFChars(signal_item_json, signalItemJson); | ||
| if (signalItemJson == nullptr) { | ||
| // GetStringUTFChars returned null (e.g. OOM, with a pending exception); | ||
| // there is nothing valid to log. | ||
| return false; | ||
| } |
There was a problem hiding this comment.
This early-return is intentional and is the fix, not a regression. sendSignal returns jboolean success, so a null signal_item_json now yields a clean false (failure) instead of calling GetStringUTFChars(nullptr, …), which is undefined per the JNI spec and would raise a Java NullPointerException back through the host app's logging call. A telemetry SDK should not crash/propagate an NPE into the caller over a null event payload — returning false is the correct, observable failure contract, and the Java wrapper already treats the boolean result as success/failure. The behavior change (NPE -> graceful false) is the intended hardening of this PR.
| catch (const std::exception&) | ||
| { | ||
| versionDec = 0ull; | ||
| } |
…eview) The DeviceFamilyVersion parse added a catch(const std::exception&) but the file only pulled in <exception> transitively. Include it explicitly so the build does not depend on transitive include order. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Bundles four small, latent safety fixes from a repo-wide material-bug review. None are recent regressions — they were introduced between 2018 and 2023.
1.
Signals_jni.cpp— use-after-free (🟠)sendSignalcalledReleaseStringUTFCharson the UTF buffer before passing it toSignals::CreateEventProperties(which copies from it by value), reading freed/unpinned memory. Moved the release to after the buffer is consumed.2.
HttpClient_WinInet.cpp/HttpClient_WinRt.cpp— data race (🟡)The
CancelAllRequests()drain loop readm_requests.empty()with no lock held, whileerase()mutates thestd::mapon the WinInet callback / PPL continuation thread underm_requestsMutex— a data race / UB. Now readsempty()under the lock each iteration, mirroring the already-correct WinRt destructor drain. (The recent #1460 fixed the destructor's drain but not these methods.)3.
JniConvertors.cpp/OfflineStorage_Room.cpp— null deref (🟢)GetStringUTFCharsreturnsnullon allocation failure; the result was fed straight into astd::stringconstructor. Added a null check before constructing.4.
WindowsRuntimeSystemInformationImpl.cpp— UWP parse (🟢)std::stoullon theDeviceFamilyVersionstring was unguarded (would throw on malformed/oversized input). Guarded with try/catch defaulting to0(the code already has aversionDec == 0→"10.0"fallback).Validation
JniConvertors.cppandOfflineStorage_Room.cpppass NDKaarch64 -fsyntax-only.Signals_jni(needs the privatesignalsmodule), the Windows HTTP clients, and the UWP path can't be built on the dev host and rely on Android/Windows CI; the WinInet/WinRt change mirrors the existing correct destructor pattern in the same files.