aarch64: boot + render on real hardware (UTM / bare metal) — UEFI ISO, crash-free#12
Merged
Conversation
Re-port of the Limine UEFI scaffolding (boot_limine.rs, aarch64-limine.ld, mmu::limine_setup_ttbr0, kernel_main_arch_limine, limine-boot feature, build-iso-aarch64.sh) onto origin/main — which has ALL the render fixes (epoll/ va_list ABI, x3 arg restore, cooperative scheduling, virtio-input). The earlier attempt was mistakenly branched off stale develop (pre-fixes).
…the shell Limine hands off at EL1t (SPSel=0), where `sp` aliases SP_EL0. Every kernel stack write then went through SP_EL0, and enter_user's kernel-stack reclaim (`mov sp, <syscall stack top>`) clobbered the user SP it had just delivered: pid1 entered EL0 with SP pointing at the kernel syscall stack and its first push faulted (EC=0x24, FAR=SP-0xa0). The QEMU -kernel path sets up EL1h itself, which is why the identical kernel worked there. One instruction: msr spsel, #1 before the boot-stack switch. Validated under edk2: present_callback=281 on a full run.
…% resume crash eret_to_el0/eret_to_el0_fp reset SP_EL1 then read the img/fp resume arrays that still live on the abandoned cooperative-yield frame. A wait loop that yields with IRQs unmasked leaves them unmasked here; a timer IRQ pushes a TrapFrame over img/fp → corrupted resume image (intermittent data abort in eret_to_el0_fp at the SPSR load with x18=0). Fix: msr daifset, #0xf before the SP reset; the eret restores SPSR_EL0T so EL0 runs with interrupts enabled. 12/12 -kernel boots clean (was ~2/8).
…UEFI ISO in CI Kernel: finish_cond_timedout_return advanced the resume PC by a hardcoded 2 bytes — correct for x86's 2-byte `syscall`, but on aarch64 `svc #0` is 4 bytes, so the parked cond_timedwait waiter resumed at svc+2 (a misaligned PC) → EC=0x22 PC-alignment fault, deterministically ~280 frames into a run (every cond_timedwait timeout was a live grenade). Use the cfg'd SYSCALL_INSN_LEN (4 on aarch64) and set aarch64_ret_in_x0 so ETIMEDOUT actually reaches x0. CI: release.yml now builds the aarch64 UEFI/Limine ISO via build-iso-aarch64.sh (LIMINE_DIR=$HOME/limine) and publishes oscortex-aarch64-<tag>.iso + .sha256 alongside the raw -kernel ELF — a real UTM/VM/bare-metal-bootable ARM image. build-iso-aarch64.sh: LIMINE_DIR is now overridable for CI.
…e (HVF/bare-metal) SP-alignment fault The x86 SysV convention seeds entry SP at stack_top-8 (RSP%16==8, return addr on the stack). AArch64 is the opposite: SP must be 16-byte aligned at ALL times (SCTLR_EL1.SA, hardware-enforced) and the return address lives in x30/LR (already seeded via p.user_lr=thread_return_trampoline_va). TCG doesn't enforce SP alignment so it silently worked; real silicon (HVF / UTM / Raspberry Pi) faults EC=0x26 the instant a spawned engine thread touches its stack — which is why it rendered headless but never on real hardware. cfg-split spawn_thread and spawn_with_bootstrap to align down to 16 on aarch64; x86 keeps the -8. Verified under HVF (cpu=host, real Apple Silicon): engine now spawns all worker threads with no SP fault and reaches the render event loop.
…PI27) — renders on real hardware The kernel ticked the EL1 PHYSICAL timer (CNTP_CTL_EL0, PPI 30). On QEMU -M virt TCG that interrupt is delivered, so it worked headless. But OSCortex runs at EL1 and under a hypervisor — Apple's HVF (what UTM uses), KVM, Xen — EL2 owns the physical timer: EL1 CNTP accesses are trapped and PPI 30 is NOT delivered to the guest. So under HVF the tick never fired → no vsync baton → the engine reached its event loop, scheduled frames, and stalled with present=0 (nothing ever rendered on real silicon, only in TCG emulation). Switch to the architected EL1-guest VIRTUAL timer (CNTV_CTL_EL0 / CNTV_TVAL_EL0 / CNTVCT_EL0, PPI 27). It is delivered under HVF, on bare metal (CNTVOFF=0 → virtual time == physical time), AND on TCG virt — strictly more portable. vsync_due already measured cadence against CNTVCT, so this is now consistent. Verified under HVF (cpu=host, real Apple Silicon): present_callback reaches 2063 frames, crash=0 — the full Flutter shell renders on real hardware.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Brings the aarch64 port from "renders only in TCG emulation" to boots and renders the full Flutter shell on real ARM hardware (Apple HVF / UTM / bare metal / Raspberry Pi), crash-free, and ships a real bootable UEFI ISO.
Fixes (each root-caused + verified)
spaliased SP_EL0 and clobbered the userspace stack.msr spsel,#1at entry. The ISO now boots under edk2/UTM.eret_to_el0*reset SP before reading the resume image off the abandoned yield frame; an IRQ in that window corrupted it (~25% of long runs). Mask IRQs across the asm. Verified 0/12.finish_cond_timedout_returnadvanced the resume PC by a hardcoded 2 bytes (x86syscalllen); aarch64svcis 4B → misaligned PC → EC=0x22, deterministic ~280 frames in. Use cfg'dSYSCALL_INSN_LEN. Verified 0/3 long boots (~2685 frames each, was 2/2 crashing).RSP%16==8convention). TCG tolerates it; real silicon faults EC=0x26 instantly. 16-byte-align on aarch64. This is why it never ran under HVF/UTM.Verified
-cpu cortex-a72): renders, crash-free (~2685 frames/boot).-cpu host, real Apple Silicon, the UTM path): present_callback 2063 (-kernel) and 600+ (final UEFI ISO), crash=0.CI / release
release.ymlnow also builds the aarch64 UEFI ISO (build-iso-aarch64.sh,LIMINE_DIR=$HOME/limine) and publishesoscortex-aarch64-<tag>.iso+.sha256alongside the raw kernel — the first UTM/VM/bare-metal-bootable ARM image. Merging tomaincuts v0.0.6.