Skip to content

aarch64: blocking-syscall resume uses entry-captured PC/SP/LR — engine passes ICU init#10

Merged
squirelboy360 merged 1 commit into
developfrom
feat/arch-aarch64-yield-fix
Jun 9, 2026
Merged

aarch64: blocking-syscall resume uses entry-captured PC/SP/LR — engine passes ICU init#10
squirelboy360 merged 1 commit into
developfrom
feat/arch-aarch64-yield-fix

Conversation

@squirelboy360

@squirelboy360 squirelboy360 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Builds on the merged ARM shell bring-up (#9).

save_return_context/_reexec set a yielding thread's resume rip/rsp from caller-passed values read from the live per-CPU scratch, which is wrong on the wake path (holds the waker's regs, not the target's) and on a self-yield (a timer-preempted sibling's SVC clobbers it). The thread then resumed on the wrong SP and its ldp x29,x30,[sp,#..]; ret read a stale/zero frame record → ret to 0, crashing the engine inside fml::icu::ICUContext::ICUContext.

Fix: capture the user PC/SP eagerly at the SVC boundary (new per-proc entry_user_rip/entry_user_rsp, set in the IRQ-masked entry capture alongside user_lr); save_return_context_inner restores from the target thread's own entry snapshot. The GPRS_CAPTURED gate guarantees each pid's entry_* are written only by its own syscall entry. x30 no longer re-read lazily. x86 cfg-gated/unchanged.

Verified: engine now passes ICU init and reaches FlutterEngineInitialize → EmbedderThreadHost worker-pool creation. Next blocker is kernel-side in sys_thread_create (a write to addr -1 during worker spawn) — separate, tracked.

x86_64 + aarch64 kernels both compile.

…ICU ret-to-0)

save_return_context / _reexec set a yielding thread's resume rip/rsp from the
caller-passed values, which came from the LIVE per-CPU scratch (user_rip()/
user_rsp()/user_lr()). That scratch is wrong in two cases:
 - on the wake path (save_return_context(wpid,...)) it holds the WAKER's regs,
   not wpid's;
 - on a self-yield, a timer-preempted sibling's SVC can clobber it before the
   yield decision.
Either way the thread resumed on the wrong SP and its `ldp x29,x30,[sp,#..]; ret`
read a stale/zero frame record → `ret` to 0 (observed crashing the engine inside
fml::icu::ICUContext::ICUContext).

Fix: capture the user PC/SP eagerly at the SVC boundary (new per-proc
entry_user_rip/entry_user_rsp, set in the IRQ-masked entry capture alongside
user_lr), and have save_return_context_inner restore from the TARGET thread's
own entry snapshot (the GPRS_CAPTURED gate guarantees each pid's entry_* are
written only by its own syscall entry, never clobbered by a cross-pid save).
x30 is no longer re-read lazily — the eager capture's value stands. x86 path
unchanged (cfg-gated; still uses the passed rip/rsp).

Verified: engine now passes ICU init and reaches FlutterEngineInitialize →
EmbedderThreadHost worker-pool creation (next blocker is kernel-side in
sys_thread_create, separate).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@squirelboy360 squirelboy360 merged commit 44f43d0 into develop Jun 9, 2026
3 checks passed
squirelboy360 added a commit that referenced this pull request Jun 9, 2026
…ICU ret-to-0) (#10)

save_return_context / _reexec set a yielding thread's resume rip/rsp from the
caller-passed values, which came from the LIVE per-CPU scratch (user_rip()/
user_rsp()/user_lr()). That scratch is wrong in two cases:
 - on the wake path (save_return_context(wpid,...)) it holds the WAKER's regs,
   not wpid's;
 - on a self-yield, a timer-preempted sibling's SVC can clobber it before the
   yield decision.
Either way the thread resumed on the wrong SP and its `ldp x29,x30,[sp,#..]; ret`
read a stale/zero frame record → `ret` to 0 (observed crashing the engine inside
fml::icu::ICUContext::ICUContext).

Fix: capture the user PC/SP eagerly at the SVC boundary (new per-proc
entry_user_rip/entry_user_rsp, set in the IRQ-masked entry capture alongside
user_lr), and have save_return_context_inner restore from the TARGET thread's
own entry snapshot (the GPRS_CAPTURED gate guarantees each pid's entry_* are
written only by its own syscall entry, never clobbered by a cross-pid save).
x30 is no longer re-read lazily — the eager capture's value stands. x86 path
unchanged (cfg-gated; still uses the passed rip/rsp).

Verified: engine now passes ICU init and reaches FlutterEngineInitialize →
EmbedderThreadHost worker-pool creation (next blocker is kernel-side in
sys_thread_create, separate).
@squirelboy360 squirelboy360 deleted the feat/arch-aarch64-yield-fix branch June 9, 2026 21:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant