Skip to content

Fix JNI string lifetime/null-safety, Windows cancel-drain race, and UWP version parse#1494

Open
bmehta001 wants to merge 9 commits into
microsoft:mainfrom
bmehta001:bhamehta/fix-jni-windows-uwp-safety
Open

Fix JNI string lifetime/null-safety, Windows cancel-drain race, and UWP version parse#1494
bmehta001 wants to merge 9 commits into
microsoft:mainfrom
bmehta001:bhamehta/fix-jni-windows-uwp-safety

Conversation

@bmehta001

Copy link
Copy Markdown
Contributor

Bundles four small, latent safety fixes from a repo-wide material-bug review. None are recent regressions — they were introduced between 2018 and 2023.

1. Signals_jni.cpp — use-after-free (🟠)

sendSignal called ReleaseStringUTFChars on the UTF buffer before passing it to Signals::CreateEventProperties (which copies from it by value), reading freed/unpinned memory. Moved the release to after the buffer is consumed.

2. HttpClient_WinInet.cpp / HttpClient_WinRt.cpp — data race (🟡)

The CancelAllRequests() drain loop read m_requests.empty() with no lock held, while erase() mutates the std::map on the WinInet callback / PPL continuation thread under m_requestsMutex — a data race / UB. Now reads empty() under the lock each iteration, mirroring the already-correct WinRt destructor drain. (The recent #1460 fixed the destructor's drain but not these methods.)

3. JniConvertors.cpp / OfflineStorage_Room.cpp — null deref (🟢)

GetStringUTFChars returns null on allocation failure; the result was fed straight into a std::string constructor. Added a null check before constructing.

4. WindowsRuntimeSystemInformationImpl.cpp — UWP parse (🟢)

std::stoull on the DeviceFamilyVersion string was unguarded (would throw on malformed/oversized input). Guarded with try/catch defaulting to 0 (the code already has a versionDec == 0"10.0" fallback).

Validation

JniConvertors.cpp and OfflineStorage_Room.cpp pass NDK aarch64 -fsyntax-only. Signals_jni (needs the private signals module), the Windows HTTP clients, and the UWP path can't be built on the dev host and rely on Android/Windows CI; the WinInet/WinRt change mirrors the existing correct destructor pattern in the same files.

…WP parse)

Bundled small fixes from a repo-wide review. None are recent regressions
(introduced 2018-2023).

1) Signals_jni.cpp (UAF): sendSignal called ReleaseStringUTFChars on the
   UTF buffer *before* passing it to Signals::CreateEventProperties (which
   copies from it by value), reading freed/unpinned memory. Move the
   release to after the buffer is consumed.

2) HttpClient_WinInet.cpp / HttpClient_WinRt.cpp (data race): the
   CancelAllRequests() drain loop read `m_requests.empty()` with no lock
   held, while erase() mutates the map on the HTTP callback / PPL
   continuation thread under m_requestsMutex -- a data race / UB on
   std::map. Read empty() under the lock each iteration, mirroring the
   already-correct WinRt destructor drain. (The recent microsoft#1460 fixed the
   destructor's drain but not these methods.)

3) JniConvertors.cpp / OfflineStorage_Room.cpp (null deref): the result
   of GetStringUTFChars (which returns null on allocation failure) was
   fed straight into a std::string ctor. Null-check before constructing.

4) WindowsRuntimeSystemInformationImpl.cpp (UWP): std::stoull on the
   DeviceFamilyVersion string was unguarded; guard it with try/catch
   defaulting to 0 (the code already has a versionDec==0 -> "10.0"
   fallback).

Validation: JniConvertors.cpp and OfflineStorage_Room.cpp pass NDK
aarch64 -fsyntax-only. Signals_jni (needs the private signals module),
the Windows HTTP clients, and the UWP path can't be built on this host
and rely on the Android/Windows CI; the WinInet/WinRt fix mirrors the
existing correct destructor pattern in the same files.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@bmehta001 bmehta001 requested a review from a team as a code owner June 22, 2026 16:53
Comment thread lib/jni/Signals_jni.cpp
Comment thread lib/offline/OfflineStorage_Room.cpp Outdated
lib/jni/Signals_jni.cpp (sendSignal): guard the jstring argument and the
GetStringUTFChars() result for null before use. A null return (e.g. OOM, with a
pending exception) previously flowed into CreateEventProperties()/Release as a
null pointer. Now returns false instead.

lib/offline/OfflineStorage_Room.cpp (ReleaseRecords): a failed tenant-token
string read was turned into an empty std::string and reported as
dropped[""] = count, silently misattributing dropped records to an empty tenant
token. Now follows the file's established pattern (GetStringUTFChars + ThrowRuntime)
and skips the entry when the read fails instead of fabricating an empty token.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR bundles several small safety fixes across the Android JNI layer and Windows-specific implementations to eliminate latent undefined behavior (use-after-free / data race), improve null-safety around JNI string conversion, and harden UWP OS version parsing against malformed input.

Changes:

  • Fix JNI string lifetime in Signals_sendSignal and add null checks for GetStringUTFChars failures before constructing std::string.
  • Remove a Windows HTTP cancel/drain data race by checking m_requests.empty() under the same mutex used by erase().
  • Guard UWP DeviceFamilyVersion parsing (std::stoull) with exception handling and preserve the existing versionDec == 0 fallback behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
lib/jni/Signals_jni.cpp Prevents using a UTF buffer after it has been released; adds null-safety for null jstring / failed UTF conversion.
lib/jni/JniConvertors.cpp Avoids constructing std::string from a null UTF pointer when GetStringUTFChars fails.
lib/offline/OfflineStorage_Room.cpp Adds a null check before converting a UTF pointer to std::string, preventing a null deref / misattribution.
lib/http/HttpClient_WinRt.cpp Fixes a drain-loop data race by reading m_requests.empty() under m_requestsMutex.
lib/http/HttpClient_WinInet.cpp Fixes a drain-loop data race by reading m_requests.empty() under m_requestsMutex.
lib/pal/universal/WindowsRuntimeSystemInformationImpl.cpp Hardens UWP version parsing by catching std::stoull exceptions and falling back to 0.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Addresses @lalitb's review: Signals_jni nativeInitialize called
strlen(convertedValue) (and ReleaseStringUTFChars) on the result of
GetStringUTFChars(base_url, ...) with no null check, so a failed
conversion (e.g. pending OOM) would dereference null. Guard it: only
read/release when non-null; an empty/failed base_url keeps the existing
default (BaseUrl left unset).

While here, make the JNI string null-safety consistent across the file
this PR already hardens. ThrowRuntime/ThrowLogic only throw when
s_throwExceptions is true; otherwise they ExceptionClear() and execution
continues, so the existing checks are not sufficient on their own. The
other GetStringUTFChars sites in OfflineStorage_Room.cpp still used the
pointer unguarded:
 - GetReservedRecords: token_utf passed straight into StorageRecord and
   ReleaseStringUTFChars (null -> std::string(nullptr) UB / release of
   null).
 - GetSetting: result = utf with no null check.
 - GetRecords: tenant_utf passed into emplace_back / released unguarded.
Each now mirrors the guard already added for the dropped-records path:
substitute an empty token (matching JniConvertors' canonical return ""
on null) and only release when non-null. This avoids null deref and
release-of-null without silently changing behavior on success.

Validated with NDK r29 clang -fsyntax-only (aarch64-linux-android23,
-std=c++14, JNI build defines) on both files: clean.

Files changed:
- lib/jni/Signals_jni.cpp: null-guard base_url GetStringUTFChars
- lib/offline/OfflineStorage_Room.cpp: null-guard token_utf, result, tenant_utf

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Comment thread lib/jni/Signals_jni.cpp Outdated
Comment on lines +64 to +66
jboolean isCopy = true;
const char *convertedValue = (env)->GetStringUTFChars(base_url, &isCopy);
if (strlen(convertedValue) > 0) {
config.ServiceRequestConfig.BaseUrl = convertedValue;
if (convertedValue != nullptr) {
Comment on lines 1284 to 1288
records.emplace_back(
std::to_string(id_j),
tenant_utf,
tenant_utf != nullptr ? tenant_utf : "",
latency,
persistence,
bmehta001 and others added 2 commits June 24, 2026 12:36
…ot review)

Unlike the other JNI string reads in this file, the GetRecords() path called
env->GetStringUTFChars(tenant_j, ...) with neither a pre-call null check on the
jstring nor a following ThrowRuntime() exception check. Passing a null jstring to
GetStringUTFChars is undefined per the JNI spec (and can leave a pending
NullPointerException in flight). Only call GetStringUTFChars when tenant_j is
non-null; the downstream already tolerates a null tenant_utf (defaults to "" and
skips the guarded ReleaseStringUTFChars).

Validated: NDK 27 clang++ --target=aarch64-linux-android24 -fsyntax-only compiles
OfflineStorage_Room.cpp (USE_ROOM) cleanly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Comment thread lib/jni/Signals_jni.cpp
Comment on lines 64 to +70
jboolean isCopy = true;
const char *convertedValue = (env)->GetStringUTFChars(base_url, &isCopy);
if (strlen(convertedValue) > 0) {
config.ServiceRequestConfig.BaseUrl = convertedValue;
if (convertedValue != nullptr) {
if (strlen(convertedValue) > 0) {
config.ServiceRequestConfig.BaseUrl = convertedValue;
}
env->ReleaseStringUTFChars(base_url, convertedValue);
Comment thread lib/offline/OfflineStorage_Room.cpp Outdated
Comment on lines +775 to +779
auto utf = env->GetStringUTFChars(token, nullptr);
std::string key(utf);
env->ReleaseStringUTFChars(token, utf);
dropped[key] = static_cast<size_t>(count);
ThrowRuntime(env, "Exception fetching token string");
// Skip rather than misattribute dropped records to an empty
// tenant token when the string read fails.
if (utf != nullptr)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pending-exception part of this is addressed in 25df775: GetRecords now calls ThrowRuntime right after the tenant-token read, which describes+clears the exception, notifies the observer, and unwinds the batch through the surrounding try/catch (consistent with GetAndReserveRecords and the rest of the file).

I'm intentionally not changing the empty-tenant-token behavior here. Defaulting an unreadable/absent tenant token to """" is the SDK's long-standing behavior in this path (it predates this PR, whose scope is JNI null-safety/crash hardening), and the records do carry a real record_id, so releaseUnconsumed() accounting is driven by index/consumption, not by the tenant string. Reworking the reserve/release batch semantics to abort-and-release on a corrupt tenant token is a separate behavioral change that deserves its own PR and validation rather than being folded into a null-safety fix.

Copilot's re-review flagged two more JNI string reads where the result was
null-checked but the jstring input was passed to GetStringUTFChars unguarded,
which crashes (or leaves a pending exception) before the result check when the
jstring is null:

- Signals_jni.cpp nativeInitialize: only read base_url when non-null; if the read
  itself fails (e.g. OOM) return immediately rather than continuing JNI calls with
  an exception pending, matching the nativeLog(signal_item_json) site.
- OfflineStorage_Room.cpp ByTenant releaseRecords (token) and GetRecords-by-tenant
  (tenantToken_java): only call GetStringUTFChars when the jstring is non-null; the
  downstream already tolerates a null result (skips/defaults).

The getSetting site is already guarded by an enclosing if (java_value).

Validated: NDK 27 clang++ --target=aarch64-linux-android24 -fsyntax-only compiles
both files cleanly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Comment on lines +498 to 502
auto token_utf = (tenantToken_java != nullptr)
? env->GetStringUTFChars(tenantToken_java, nullptr)
: nullptr;
ThrowRuntime(env, "string tenant");
auto latency = static_cast<EventLatency>(std::max(latency_lb,

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pending-exception half of this is fixed in 25df775 (GetRecords now calls ThrowRuntime after the tenant-token read; the same is already done in GetAndReserveRecords at the existing ThrowRuntime(env, "string tenant") right after its read, which describes+clears the exception, notifies the observer, and unwinds via the surrounding try/catch).

I'm intentionally not changing the empty-tenant-token fallback. Defaulting an unreadable/absent tenant token to """" is the SDK's long-standing behavior in this path and predates this PR (whose scope is JNI null-safety/crash hardening). Records still carry a real record_id, and releaseUnconsumed() accounting is driven by index/consumption rather than the tenant string. Reworking the reserve/release batch semantics to abort-and-release on a corrupt tenant token is a separate behavioral change that should get its own PR and validation rather than riding along with a null-safety fix.

Comment on lines +1361 to +1363
const char* tenant_utf = (tenant_j != nullptr)
? env->GetStringUTFChars(tenant_j, nullptr)
: nullptr;
…lot review)

If GetStringUTFChars(tenant_j) fails (e.g. OOM) it returns null and leaves a
pending JNI exception. GetRecords() previously continued issuing Get*Field calls
with that exception in flight, which can make them return defaults. Call
ThrowRuntime after the read (as GetAndReserveRecords and the rest of this file
already do) to describe+clear the exception, notify the observer, and unwind the
batch via the surrounding try/catch.

Validated: NDK 27 clang++ -fsyntax-only compiles cleanly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Comment thread lib/jni/JniConvertors.cpp
Comment on lines 14 to +17
size_t jstr_length = env->GetStringUTFLength(jstr);
auto jstr_utf = env->GetStringUTFChars(jstr, nullptr);
if (jstr_utf == nullptr)
return "";

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 1274680 by clearing the pending exception before returning, so callers no longer continue with an exception in flight (the actual hazard you describe). I kept the std::string return contract rather than switching JStringToStdString to std::optional/out-param: it's called from many sites, an OOM mid-conversion is the only failure mode here, and the empty-string-on-failure behavior is the established convention. Reworking the signature and every call site is a broader refactor that's out of scope for this null-safety hardening PR.

…ot review)

When GetStringUTFChars returns null (e.g. OOM) it leaves a pending Java exception.
JStringToStdString returned "" without clearing it, so callers kept making JNI
calls with an exception in flight. Clear the pending exception before returning,
consistent with the ExceptionClear handling elsewhere in the JNI layer.

Keeps the existing string-returning contract rather than redesigning the helper's
signature, so no call sites change.

Validated: NDK 27 clang++ -fsyntax-only compiles cleanly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Comment thread lib/jni/Signals_jni.cpp
Comment on lines +28 to +36
if (signal_item_json == nullptr) {
return false;
}
const char *signalItemJson = (env)->GetStringUTFChars(signal_item_json, &isCopy);
env->ReleaseStringUTFChars(signal_item_json, signalItemJson);
if (signalItemJson == nullptr) {
// GetStringUTFChars returned null (e.g. OOM, with a pending exception);
// there is nothing valid to log.
return false;
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This early-return is intentional and is the fix, not a regression. sendSignal returns jboolean success, so a null signal_item_json now yields a clean false (failure) instead of calling GetStringUTFChars(nullptr, …), which is undefined per the JNI spec and would raise a Java NullPointerException back through the host app's logging call. A telemetry SDK should not crash/propagate an NPE into the caller over a null event payload — returning false is the correct, observable failure contract, and the Java wrapper already treats the boolean result as success/failure. The behavior change (NPE -> graceful false) is the intended hardening of this PR.

Comment on lines +88 to +91
catch (const std::exception&)
{
versionDec = 0ull;
}
…eview)

The DeviceFamilyVersion parse added a catch(const std::exception&) but the file
only pulled in <exception> transitively. Include it explicitly so the build does
not depend on transitive include order.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants