Skip to content

feat: native engine port — both arches boot+render, USB HID, boot screen, pkg pipeline#14

Merged
squirelboy360 merged 121 commits into
developfrom
feat/native-engine-port
Jun 14, 2026
Merged

feat: native engine port — both arches boot+render, USB HID, boot screen, pkg pipeline#14
squirelboy360 merged 121 commits into
developfrom
feat/native-engine-port

Conversation

@squirelboy360

Copy link
Copy Markdown
Contributor

Merges the native-engine-port line into develop. OSCortex builds from one source
tree and boots + renders the full Flutter shell on both architectures.

Highlights

  • Both arches, one tree — x86_64 (Limine ISO) and aarch64 (.kernel + Limine ISO) boot and render the shell.
  • Package pipeline — streams app bundles over the kernel TCP/IP stack, SHA-256 + Ed25519-signature verified, cached, installed.
  • Microkernel — processes/threads/scheduler, per-process capability enforcement, app-crash auto-recovery (crash isolated, shell survives).
  • Input/UI — window manager + double-buffered compositor, software cursor, USB HID mouse/keyboard over xHCI on aarch64 (.kernel), and a dot-matrix boot screen (replaces log spam; F2 verbose).

This session

  • USB HID mouse/keyboard on aarch64 (PCIe ECAM, BAR self-assign, xHCI enumeration, HVF-safe).
  • Dot-matrix OSCORTEX boot screen on both arches.
  • arm64 ISO boots in UTM/HVF — fixed cross-arch initramfs /init contamination (build guard added) + gated USB off the ISO build (QEMU+HVF assert(isv) host bug on the edk2 path) + re-assign firmware's 512 GiB BAR into mappable space.

Verified

  • arm64 ISO: boots to the full shell live in UTM (HVF).
  • arm64 .kernel: boot screen + USB mouse + shell (HVF).
  • x86 ISO: boot screen + shell (TCG/UTM; shell confirmed headless).

Known issues (tracked)

  • arm64 ISO + USB together crash QEMU+HVF (host assert(isv)); USB works on .kernel/real HW. USB is gated off the ISO.
  • x86 app-launch needs the AOT toolchain; post-app-launch freeze under investigation.
  • CI must build each arch against a clean initramfs/ (guard now fails loudly on contamination).

- PS/2 IntelliMouse 4-byte scroll wiring (kernel) -> EV_SCROLL -> embedder
  FlutterPointerEvent (signal_kind=kScroll); direction+speed configurable.
- Settings screen in the shell: natural-scroll toggle + speed slider, live via
  config:* messages over the oscortex/shell channel.
- Kernel boot spinner drawn on the framebuffer during JIT warm-up
  (drivers/fb.rs::draw_boot_splash, driven by compositor::tick); clears when
  Flutter presents its first frame. Plus a 'Launching <app>' overlay in the shell.
- Adaptive embedder frame pump: gentle during warm-up, 60fps after input, ~8fps
  idle (was a 1ms/1000-per-sec flood) -> req:frame ratio 115:1 -> ~7:1.
- Revert MAX_FRAMES to 1<<20 (RAM above 4GiB is not yet safely mappable; the cap
  is load-bearing) with an explanatory comment.
- docs/arch.txt: document the real JIT execution model + runtime status.
Build the Flutter engine from source as a first-class OSCortex AOT target
instead of shimming a Linux engine. Phases, port surface, build infra, and the
hack-removal checklist in docs/native-engine-port.md; direction recorded in
docs/arch.txt.
Draw the destination ahead of the work: three clean layers (kernel native ABI /
shared AOT engine / self-contained AOT app bundles with own PIDs), the shared-
framework property, and the hack-removal it enables. No Linux costume.
…ace map

Baseline libflutter_engine.so (377MB, x64, embedder API) builds from source in
the container — toolchain proven. Mapped the exact port surface: ~1,200 lines
(Dart VM os_oscortex/os_thread_oscortex ~1000, fml message_loop_oscortex+paths,
reuse posix) + GN glue. The fml message loop is the critical file — its emulated
equivalent is what livelocks rendering today, so the native port also fixes sync.
Capture the validated Phase 0 setup as one-command scripts so the repetitive,
heavy engine-build infra is reproducible, not tribal knowledge:
- setup-engine-build.sh: idempotent container + depot_tools + pinned gclient sync
  (encodes the name='.' fix and the flags that work).
- build-engine.sh: gn configure + ninja for baseline | oscortex targets.
- README.md: the contributor flow, the edit->build incremental loop, and the
  pitfalls already solved (gclient layout, emulation, disk, prebuilt-dart).
Engine checkout stays OUTSIDE the repo (22GB, never committed).
Add 'oscortex' as a --target-os. Since OSCortex has no sysroot/libc yet (runs
linux-ABI via emulation), it links against linux but sets a new is_oscortex GN
flag to select the OSCortex backend sources later; a true OSCortex toolchain is
a later sub-phase. gn configures cleanly (1714 targets; args.gn = target_os
linux + is_oscortex true).

- engine-port/patches/: the two tracked diffs (gn tool, BUILDCONFIG).
- engine-port/apply-port.sh: applies patches + (Phase 2) backend sources into a
  fresh checkout, idempotent.
Temp patch files cleaned from the workspace (no remnants).
This is a public-facing repo; debugging slop and dead-path artifacts don't belong.

Removed (all dead, none load-bearing for the working JIT build):
- scratch/ (192 files): the entire AOT-deser debugging tree — analyze_*/disasm_*/
  assemble_*/check_* scripts, .bin/.log artifacts, and vendored gen_snapshot/
  shader binaries. Referenced only by build-iso.sh's dead shell-AOT step.
- tools/flutter-engine/libflutter_engine.so.bak: 91MB orphaned 'pristine engine'
  backup, zero references.
- gen_help.txt, test_size, test_size.c, .DS_Store: stray one-offs.
- build-iso.sh: excised the dead [0.3/5] shell-AOT block (gen_snapshot -> libapp.so
  -> patch_libapp) — the shell runs JIT off kernel_blob.bin, not an AOT libapp.so;
  also dropped libapp.so from REQUIRED_FILES. Native AOT is done properly via the
  engine port (docs/native-engine-port.md).
- .gitignore: ignore /scratch/ and /qemu-pipe.* so they can't return.

Deliberately KEPT (still load-bearing for the current JIT build; removed in the
port's Phase 4 once the native engine works): engine_patch.py, patch_libapp.py
(still used by tools/build-flutter-osx.sh for the apps), the Linux-emulation shim.

Verified: build-iso.sh and build-kernel-iso-fast.sh both pass bash -n; the working
fast build references neither scratch/ nor libapp.so.
Add the OSCortex fml platform backend and wire is_oscortex selection. Verified:
fml.message_loop_oscortex.o + fml.paths_oscortex.o compile into libfml, the linux
message loop is excluded, and message_loop_impl.cc selects MessageLoopOscortex.

- engine-port/src/flutter/fml/platform/oscortex/: message_loop_oscortex.{cc,h}
  (epoll+timerfd loop — the sync-bug-fixer, starts as a clean clone to diverge to
  native primitives) + paths_oscortex.cc.
- patches/0003: build_config.h defines FML_OS_OSCORTEX (additive, gated on the
  FLUTTER_OSCORTEX define), message_loop_impl.cc selects MessageLoopOscortex first,
  fml/BUILD.gn swaps in the oscortex sources (excludes linux) + sets the define.
- apply-port.sh applies 0003 + copies src/.

Note: the backend uses the Linux-ABI calls (epoll/timerfd) that OSCortex natively
implements — OSCortex is a native kernel + own libc that is Linux-ABI-compatible
(like Fuchsia/Starnix), not Linux. These calls are the divergence seam if we ever
move to a fully custom ABI.
…t VM clone

- Verified a release/AOT oscortex build configures and gen_snapshot is buildable
  from our tree -> engine+snapshot version-matched, dissolving the old AOT
  dead-end (the '1247 base objects' mismatch). AOT is now a build step, not a
  research problem.
- Deferred the Dart VM os_oscortex clone: on path A os_linux.cc already works;
  cloning it byte-identically is busywork. Add a Dart VM backend only when a
  primitive must actually diverge.
libflutter_engine.so (377MB) links for the oscortex target and bakes in OUR
backend: 17 MessageLoopOscortex refs, 0 MessageLoopLinux. First Flutter engine
built from source FOR OSCortex.

Refined the gn target (patch 0001): oscortex now configures as a HOST linux-x64
build + is_oscortex flag (identical to the proven baseline, our platform backend
selected), instead of an explicit target_os that tripped ANGLE/wayland scope and
embedder-constructor mismatches. Runtime still renders software via kSoftware.
…ld) flow

Make the from-source build a one-time maintainer task, not something every dev
repeats — exactly how Flutter distributes its own engine.

- artifact.config: pins ARTIFACT_VERSION + Flutter rev + the R2 base URL.
- fetch-engine.sh: CONSUMER flow — downloads the pinned prebuilt engine
  (libflutter_engine.so + gen_snapshot + icudtl.dat) from Cloudflare R2, verifies
  sha256, stages it where the OS build expects. Seconds, no checkout.
- publish-engine.sh: MAINTAINER flow — packages the built artifacts and uploads
  to R2 (rclone or aws/S3 to the R2 endpoint). Run after build-engine.sh when the
  port/version changes; bump ARTIFACT_VERSION.
- README: two-tier model (consumers fetch, maintainers build+publish), R2 layout,
  multi-arch = one tarball per ISA.

Scaffolding is ready now; first publish happens once the release/AOT engine lands
(Phase 3). Set ARTIFACT_BASE_URL to your R2 URL to activate fetch.
- setup-r2.sh: create the R2 bucket + enable public r2.dev URL + auto-write
  ARTIFACT_BASE_URL into artifact.config. Idempotent. Needs Cloudflare auth first
  (npx wrangler login, or CLOUDFLARE_API_TOKEN).
- publish-engine.sh: upload via wrangler (npx, no install) as the primary R2
  backend, rclone/aws as fallbacks.
- README: one-time R2 host setup steps.

Bucket creation requires the account owner's Cloudflare auth, so this is the
single manual step; everything else is automated.
R2 needed dashboard activation; switched the artifact host to Google Cloud
Storage (gs://dotcorr-oscortex-engine, public read, billing active). Verified
end-to-end: gcloud upload -> anonymous https://storage.googleapis.com/... fetch.
- artifact.config: ARTIFACT_BASE_URL -> GCS public URL + GCS_BUCKET.
- publish-engine.sh: gcloud storage cp as the primary backend (wrangler/rclone/aws
  remain as fallbacks). fetch-engine.sh is unchanged (curl, host-agnostic).
The repo is public, so GitHub Release assets download anonymously — free, no
egress fees (vs GCS ~$0.12/GB), no separate cloud account, and gh is already
authed. Verified end-to-end: gh release upload -> anonymous
https://github.com/DotCorr/oscortex/releases/download/... fetch.

- artifact.config: ARTIFACT_BASE_URL -> GitHub releases download URL; GITHUB_REPO.
- publish-engine.sh: gh release create/upload as the PRIMARY backend (GCS/R2/S3
  remain as documented fallbacks). fetch-engine.sh unchanged (host-agnostic curl).
- README: GitHub Releases is the host; no bucket setup needed.
- Removed the unused GCS bucket (no remnants).
…erified

The release oscortex engine (33MB, vs 377MB debug — no JIT) and a version-matched
gen_snapshot (6.5MB) built from one tree. Published the first artifact to GitHub
Releases (oscortex-engine-1) and verified the full round trip:
  publish-engine.sh -> gh release  ->  fetch-engine.sh downloads+checksums+stages.
Engine + gen_snapshot from the same tree are version-matched, which dissolves the
old AOT dead-end (the '1247 base objects' mismatch).

Fix: publish-engine.sh derives the workspace from the container's /work mount
(robust to the workspace dir name) instead of assuming a fixed name.
shell app -> frontend_server -> AOT dill (24MB) -> our version-matched
gen_snapshot -> libapp.so (4.4MB native ELF). Verified a real AOT snapshot: all 4
_kDart*Snapshot{Data,Instructions} present with REAL native instructions (T/text,
not zeroed stubs), ELF64 x86-64. The multi-session 'Snapshot expects N base
objects, provided 0' blocker is dissolved because engine + gen_snapshot are built
from one tree (version-matched). compile-app-aot.sh makes it reproducible.
The native, from-source, AOT-compiled Flutter engine RENDERS the shell on
OSCortex: FlutterEngineRunsAOTCompiledDartCode -> libapp.so AOT path ->
present_callback 393 frames, ZERO JIT warmup (no kernel_blob, no codegen). The
UI comes up immediately, no 60-90s compile. This is the whole point of the port.

Fix to get here: the release/AOT engine rejects the JIT-era GC/heap dart-flags
(switches.cc:478 disallowed) — pass only the 5 AOT-safe engine args (argc=5)
when is_aot, dropping the old --old_gen_heap_size etc.

Known follow-up (separate, a regression from this session's IntelliMouse 4-byte
scroll wiring): mouse clicks/Y are mangled + pointer events not reaching the
engine; to fix next.
The 4-byte scroll-packet mode added this session corrupted pointer data: dy
mis-parsed (cursor pinned to the top, y=0) and the flags byte mis-parsed (buttons
stuck at 0), so clicks never registered and pointer events stopped reaching the
engine. Revert to the proven 3-byte packet mode (working clicks > broken wheel).
Verified: cursor Y tracks normally again (y=32 start, not pinned at 0). Scroll to
be re-added later with correct 4-byte parsing + resync that doesn't regress clicks.
Input is confirmed working under AOT (122 pointer events reached Flutter, hover +
clicks register, button presses detected, dropped_total=0 — the earlier '0' was
the broken IntelliMouse 4-byte mode + minimal interaction, fixed by the 3-byte
revert). Remove the temporary DIAG logging from the embedder + ipc_display.

engine-port/compile-app-aot.sh: per-app AOT compile (frontend_server -> dill ->
version-matched gen_snapshot -> libapp.so), no patch — used to give each app its
own AOT bundle so the AOT engine can run them.
Previously every launched app reused the shell's /system/flutter/libapp.so
(aot_va=0 JIT-era skip), so tapping a tile ran the shell snapshot in the
app host and stalled. Resolve the per-app snapshot by registry lookup:
build_app_libapp_path(app_id) -> /Applications/<name>.app/libapp.so, dlopen
it, and point the AOT snapshot loader at that path. Shell keeps its own path.

Each app is AOT-compiled to its own libapp.so and staged under its .app
bundle, so a launched host loads its own native snapshot, not the shell's.
pthread_cond_signal/broadcast with no waiter parked is a no-op by contract:
the predicate is mutex-protected, so a thread that hasn't yet entered
cond_wait observes it under the lock and never waits. The kernel was instead
recording such signals in COND_PENDING_SIGNALS and letting the next cond_wait
consume one as an immediate (spurious) return. A hot mutator (pid 2) that
signals its own monitor with woke=0 then re-waits would consume its own fake
pending, return 0, find its predicate still false, and re-wait forever — the
cond-pending-consume livelock that froze render at a fixed frame count.

Remove the mechanism end to end (consume in cond_wait, posts in
cond_signal/cond_broadcast, the state map, the import). cond_wait now relies
solely on the seq protocol, which delivers every real signal race-free under
cooperative single-core scheduling (the value-check and park cannot interleave
with a signaler). No remnants.
sys::dlopen forwards path.len() to the kernel as the path length, so a
NUL-terminated byte string makes the kernel read the trailing \0 as part of
the filename and the open fails. The per-app and shell AOT paths were passing
"...libapp.so\0"; strip the NUL at the dlopen call. The AOT snapshot itself
loads via aot_snapshot_load (which maps it executable), so this only silenced
a spurious failure path, but it is a real bug and removes the warning.
The Debug-level hot per-syscall traces (epoll_ctl, mprotect, cond-signal, every
keypress) each block on a synchronous COM1 UART write AND re-render the
framebuffer text console, with interrupts disabled. Measured: ~14000 log lines
during a single boot+app-launch. Dropping to Error cut serial volume ~12x
(14000 -> 1185 lines), removing tens of seconds of emulated-bare-metal warmup.
Raise the one line in logger::init back to Debug for deep tracing.
Root cause of the sporadic render crashes. The user GPR snapshot lives in a
PER-CPU scratch (gs:[..]) written by the syscall entry stub and shared by every
thread on the core. save_full_user_gprs read it LAZILY at yield time — but the
wait loops do sti;hlt and the timer ISR can switch threads mid-handler, so
another thread's syscall entry overwrites the per-CPU snapshot before our save
runs. We then stored the OTHER thread's callee-saved regs (rbx/rbp/r12-r15) into
our context; on resume rbx was garbage.

Pinpointed via addr2line/objdump: fml::MessageLoopOscortex::Run() resuming from
epoll_wait wrote running_ through this=0x1400 (=1280*4, a Skia row stride leaked
from another thread) -> SIGSEGV pid=3. This is the whole 0/16/140/489-frames-
across-identical-boots sporadicity.

Fix: capture_user_gprs_at_entry() snapshots the GPRs into the thread's own
PTABLE slot at dispatch_fast entry, while the per-CPU snapshot is still fresh
for this thread (before any handler/yield/interrupt window). A per-CPU
'captured' flag makes later yield-time save_full_user_gprs calls no-ops, so a
clobbered shared snapshot can't leak in.

Validation: 4/4 headless boots render cleanly (present 98-105) with ZERO engine
SIGSEGV, vs the prior sporadic crashes. Render is now reliable.
The frame pump only called FlutterEngineScheduleFrame in the no-event branch,
so a hover/click/scroll waited for the event queue to drain plus a wm_event_wait
timeout (up to ~16ms) before the repaint was even requested. Request the frame
in the same iteration input is received; the engine coalesces redundant requests
and Flutter flushes the batched pointer packet at BeginFrame, so the scheduled
frame reflects the event. Removes the scheduling-side input latency (most
visible on real hardware; under cross-arch QEMU TCG the emulation dominates).
Each submitted frame did up to 5 full 1M-pixel passes: a strided re-pack into a
freshly allocated 4MB Vec, a byte-indexed B<->R swap, a full-screen fill_rect
clear, blit_rgba32, and swap_buffers. Measured ~65ms median (wildly variable
10-92ms) -> a ~15fps ceiling from the blit alone, which read as a low refresh
rate / laggy feedback.

- Fuse the strided re-pack into the swap: submit_bgra_impl reads the source
  stride directly and packs in ONE u32-wise pass (read BGRA as u32, swap bytes
  0<->2), eliminating the intermediate buffer + the 4MB/frame allocation.
- Skip the full-screen fill_rect clear when a presented surface already covers
  the screen (the common case: one full-screen Flutter surface).

Measured after: blit median 5.4ms, steady 5.0-7.6ms (~12x faster, variance
gone). Render verified correct + crash-free; colors unchanged.
next_runnable_pid_locked computed the foreground-exclusive group AFTER the
input-target and embedder-baton shortcuts, so a due shell (pid 1) baton could
schedule the shell engine even while an app is foreground — two heavy Flutter
VMs on one cooperative core, the documented cause of the launched app's crash.
Compute fg/exclusive FIRST and gate both shortcuts: suppress the pid-1 baton
when an app is exclusive, and only honour the input shortcut for a target in the
foreground group. No behavioural change when the shell is foreground (the common
case): exclusive=false short-circuits both gates.

Closes the baton concurrency hole; full app-launch validation still pending an
interactive tile-tap (HMP mouse_button injection does not produce reliable
clicks headless).
Replace the generic reply-null catch-all in platform_message_callback with
a real channel dispatcher (match on channel name). Every channel the
framework reaches for is now routed to a concrete handler that does the
platform work or returns a codec-correct typed ack; the final catch-all
logs the channel name ([embedder/chan] unbound: <name>) so nothing stays an
invisible stub.

Bound channels:
- flutter/textinput (JSONMethodCodec): setClient/setEditingState/show/hide/
  clearClient. Editing state (text + selection) is maintained in the embedder.
  PS/2 set-1 scancodes are now mapped to characters (shift/caps, backspace,
  enter, tab, space, arrows, home/end, delete); on a key press with an active
  text client the stored editing state is mutated and TextInputClient.
  updateEditingState is pushed back over flutter/textinput. Adds a small
  inline JSON reader/writer (no_std, no crates) for the editing-state maps.
- flutter/mousecursor: parse activateSystemCursor kind and ack (no kernel
  set-cursor-shape syscall exists yet; logged + acked).
- flutter/platform: Clipboard.setData/getData/hasStrings backed by an
  in-embedder buffer; SystemNavigator.pop; SystemSound/HapticFeedback/
  SystemChrome acked.
- flutter/navigation, system, accessibility, spellcheck, processtext, menu,
  contextmenu, scribe, restoration, keyevent, platform_views, isolate,
  lifecycle: explicit JSON typed-null ack.
Replace the ack-only stubs from the platform-channel contract with actual
OS-provided capabilities. OSCortex is the platform under the stock engine, so
where Flutter needs a platform service the kernel now implements and binds it.

Mouse cursor shape (flutter/mousecursor.activateSystemCursor):
- compositor: ACTIVE_CURSOR_SHAPE atomic + vector cursor sprites (arrow, I-beam,
  hand/link, forbidden, grab, horizontal/vertical resize, hidden). draw_software_cursor
  dispatches on the active shape; set_cursor_shape() repaints immediately.
- new syscall SYS_CURSOR_SHAPE_SET (0x4B2). Embedder maps the Flutter cursor kind
  string to a CURSOR_SHAPE_* and calls it, so hovering a link shows a hand, a text
  field shows an I-beam, etc.

Semantics / accessibility:
- embedder wires update_semantics_callback2 (FlutterProjectArgs off 280) and calls
  FlutterEngineUpdateSemanticsEnabled(engine, true) after run. The callback receives
  the FlutterSemanticsUpdate2 tree and stores each node (id, label, rect, flags,
  actions) in a live embedder structure for a11y / automation consumers. flutter/
  accessibility now replies with the correct StandardMessageCodec null (0x00), not JSON.

System clipboard (flutter/platform Clipboard.*):
- kernel-global clipboard buffer (embedder::clipboard) shared across every app/host,
  with SYS_CLIPBOARD_SET (0x4B3) / SYS_CLIPBOARD_GET (0x4B4). The embedder routes
  setData/getData/hasStrings to the kernel, so clipboard survives across apps.

SystemNavigator.pop (flutter/platform):
- SYS_APP_CLOSE_FOREGROUND (0x4B5): refocuses the shell (pid 1) and wakes it. The
  app embedder, when it is a launched host, flushes its reply, calls the syscall and
  exits so focus returns to the shell.

SystemSound.play (flutter/platform):
- PC-speaker beep driver (drivers::beep) via PIT channel 2 + port 0x61, exposed as
  SYS_BEEP (0x4B6). click vs alert play distinct short tones.

Deliberate no-ops (no such hardware), acked with the correct codec:
- HapticFeedback.* (no vibration motor), SystemChrome.* (single full-screen
  compositor surface, no system UI overlays / orientation).
Brings in the embedder channel dispatcher, text input (flutter/textinput),
and real OS-backed capabilities: cursor-shape sprites + SYS_CURSOR_SHAPE_SET,
live semantics, kernel-global clipboard, SystemNavigator.pop return-to-shell,
and a PC-speaker beep driver. Builds green (embedder + kernel).
…% resume crash

eret_to_el0/eret_to_el0_fp reset SP_EL1 then read the img/fp resume arrays that
still live on the abandoned cooperative-yield frame. A wait loop that yields
with IRQs unmasked leaves them unmasked here; a timer IRQ pushes a TrapFrame
over img/fp → corrupted resume image (intermittent data abort in eret_to_el0_fp
at the SPSR load with x18=0). Fix: msr daifset, #0xf before the SP reset; the
eret restores SPSR_EL0T so EL0 runs with interrupts enabled. 12/12 -kernel
boots clean (was ~2/8).
…UEFI ISO in CI

Kernel: finish_cond_timedout_return advanced the resume PC by a hardcoded 2
bytes — correct for x86's 2-byte `syscall`, but on aarch64 `svc #0` is 4 bytes,
so the parked cond_timedwait waiter resumed at svc+2 (a misaligned PC) →
EC=0x22 PC-alignment fault, deterministically ~280 frames into a run (every
cond_timedwait timeout was a live grenade). Use the cfg'd SYSCALL_INSN_LEN
(4 on aarch64) and set aarch64_ret_in_x0 so ETIMEDOUT actually reaches x0.

CI: release.yml now builds the aarch64 UEFI/Limine ISO via build-iso-aarch64.sh
(LIMINE_DIR=$HOME/limine) and publishes oscortex-aarch64-<tag>.iso + .sha256
alongside the raw -kernel ELF — a real UTM/VM/bare-metal-bootable ARM image.
build-iso-aarch64.sh: LIMINE_DIR is now overridable for CI.
…e (HVF/bare-metal) SP-alignment fault

The x86 SysV convention seeds entry SP at stack_top-8 (RSP%16==8, return addr
on the stack). AArch64 is the opposite: SP must be 16-byte aligned at ALL times
(SCTLR_EL1.SA, hardware-enforced) and the return address lives in x30/LR (already
seeded via p.user_lr=thread_return_trampoline_va). TCG doesn't enforce SP
alignment so it silently worked; real silicon (HVF / UTM / Raspberry Pi) faults
EC=0x26 the instant a spawned engine thread touches its stack — which is why it
rendered headless but never on real hardware. cfg-split spawn_thread and
spawn_with_bootstrap to align down to 16 on aarch64; x86 keeps the -8.

Verified under HVF (cpu=host, real Apple Silicon): engine now spawns all worker
threads with no SP fault and reaches the render event loop.
…PI27) — renders on real hardware

The kernel ticked the EL1 PHYSICAL timer (CNTP_CTL_EL0, PPI 30). On QEMU -M virt
TCG that interrupt is delivered, so it worked headless. But OSCortex runs at EL1
and under a hypervisor — Apple's HVF (what UTM uses), KVM, Xen — EL2 owns the
physical timer: EL1 CNTP accesses are trapped and PPI 30 is NOT delivered to the
guest. So under HVF the tick never fired → no vsync baton → the engine reached
its event loop, scheduled frames, and stalled with present=0 (nothing ever
rendered on real silicon, only in TCG emulation).

Switch to the architected EL1-guest VIRTUAL timer (CNTV_CTL_EL0 / CNTV_TVAL_EL0 /
CNTVCT_EL0, PPI 27). It is delivered under HVF, on bare metal (CNTVOFF=0 → virtual
time == physical time), AND on TCG virt — strictly more portable. vsync_due
already measured cadence against CNTVCT, so this is now consistent.

Verified under HVF (cpu=host, real Apple Silicon): present_callback reaches 2063
frames, crash=0 — the full Flutter shell renders on real hardware.
…ld + unblocks the package pipeline

The on-demand package syscalls (SYS_PKG_RESOLVE/CATALOG/SET_SERVER/EVICT) were
assigned 0x390-0x393 in embedder/abi.rs — the SAME numbers as the Phase 53-55
sched_yield/get_cpu_time/fork/kill_signal syscalls. In dispatch.rs the pkg match
arms precede the literal 0x390-0x393 arms, so 0x390 dispatched to pkg_resolve and
sched_yield was DEAD CODE (4 unreachable-pattern warnings). Every embedder yield
silently called pkg_resolve(garbage); rendering survived only because the yield's
kernel round-trip happens regardless of handler.

Move the pkg syscalls to a free block (0x4C0-0x4C3). Now 0x390 cleanly dispatches
to sched_yield and the pkg numbers are distinct + reachable — the prerequisite for
wiring the (already-built: HTTP+SHA256+LRU) package pipeline through to the shell.
x86 kernel compiles clean, 0 unreachable-pattern warnings (was 4).
The kernel pipeline (HTTP/1.1 over smoltcp → SHA-256 verify → LRU cache →
install) and the Dart shell calls (pkg_catalog / pkg_resolve:$name /
pkg_set_server:$ip:$port in shell_service.dart) both existed, but the embedder
never routed them — the messages fell through unhandled, so the whole
on-demand-package feature was dead at the seam.

sys.rs: SYS_PKG_RESOLVE/CATALOG/SET_SERVER consts (0x4C0-0x4C2, the post-
collision numbers from b4dcdb5) + syscall wrappers.
main.rs: three oscortex/shell handlers next to install:/uninstall: —
  pkg_catalog      → {"packages":[{name,version,size}…]} (parses the kernel's
                     128-byte PkgManifest records: name[0..64], version[64..80],
                     size_bytes u32 @112)
  pkg_resolve:name → {"app_id":N} (kernel runs fetch→verify→cache→install)
  pkg_set_server:ip:port → {"ok":…} (dotted-IP parser → packed BE u32)

Verified: x86 ISO renders (present=206, crash=0) — no regression on the
render-critical host binary.
…e first time

pkg/http.rs waited for an outbound connection by zero-byte tcp_read probing,
but tcp_read returns EAGAIN whenever can_recv() is false — which is exactly the
state of an ESTABLISHED socket whose peer hasn't sent anything (an HTTP server
says nothing until it receives the request). The probe could never succeed, so
EVERY kernel HTTP fetch failed with Connect since the code was written — the
on-demand package pipeline silently booted into "offline mode" on every run
([pkg] catalog fetch failed: Connect, invisible at the Error log level).

Fix: net/tcp.rs gains tcp_is_established(fd) — may_send() is the correct
handshake-complete signal, with is_open()==false reported as ECONNREFUSED for
refused/reset connects. http.rs polls that instead (240×50k spins ≈ wider
window for TCG).

Verified live: boot with virtio-net + pkg-server on host :8080 → serial
"[pkg] parsed catalog: 2 packages / catalog refreshed" and the server access
log shows the guest's GET /catalog.bin → 200 (260 bytes). Combined with the
embedder seam (463e689) and the syscall renumber (b4dcdb5), the on-demand
package pipeline is now reachable end-to-end: boot → HTTP catalog fetch →
shell sees remote packages → pkg_resolve streams + SHA-256-verifies + caches.
…ch goes from broken to working

Streaming a 5.4 MB .osx bundle through the kernel's HTTP client failed or
crawled at a constant ~28 KB/s regardless of buffer sizes. Four stacked causes,
each found by test iteration:

1. tcp.rs: smoltcp's now() was the COMPOSITOR FRAME COUNTER ×17 ms. Whenever
   frames aren't presenting (early boot, headless), smoltcp time crawled ~15×
   slower than reality, so ACK-delay/retransmit/window timers stretched 15× and
   paced every bulk transfer at ~28 KB/s (a 5.4 MB fetch took a constant 193 s
   and starved the reader into timeouts). Use the rdtsc-based monotonic_ns()
   (the same source timerfds use). syscall::poll made pub(crate) for access.

2. tcp.rs: tcp_connect derived the local port from the SLOT index, so two
   back-to-back connections to the same server reused the identical 4-tuple —
   the peer's TIME_WAIT swallowed the second SYN (catalog fetch worked, the
   immediately-following bundle fetch always died). Monotonic ephemeral-port
   counter (49152 + n % 16384).

3. http.rs: the read loop stopped after 300 consecutive EAGAINs (~no more
   data heuristic), silently truncating long transfers; parse_response then
   min()'d the short body and the SHA-256 check failed confusingly downstream.
   Content-Length-aware reading (don't give up while bytes are outstanding)
   + short-body now errors Malformed instead of returning clipped bytes.

4. tcp.rs/virtio_net.rs: 4 KiB socket buffers (= 4 KiB TCP window) and only 8
   posted RX descriptors (~12 KB in flight) forced stop-and-wait behaviour and
   ring overflows. 64 KiB buffers, 64 RX slots.

Verified end-to-end: boot → catalog fetch → resolve streams the 5.4 MB Demo
bundle over virtio-net (65 s under TCG, was 193 s + failing), SHA-256 verifies,
LRU-caches, installs (app_id assigned). The on-demand package pipeline —
fetch → verify → cache → install — now works from a cold boot.
… dead foreground group no longer wedges the scheduler

Makes the "one crashes, the kernel recovers it in milliseconds" promise real.
Before: a launched app that faulted was reaped to a zombie and focus returned
to the shell, but the app was never relaunched — and a subtler bug, a dead
foreground group could WEDGE the scheduler (foreground-exclusive filtering
kept skipping every other process, including the shell, because the dead group
was still 'focused').

Five parts:
1. process::next_runnable_pid_locked — if the foreground group's leader is a
   zombie/dead, exclusivity lapses so the shell (and recovery) get the CPU
   again instead of the whole system spinning on a dead group.
2. process::exit → app_registry::note_thread_exit(leader, pid, code), resolved
   under the PTABLE lock then dispatched lock-free (try-lock, ISR-safe).
3. process::kill_group(leader) — tears down a crashed app's surviving threads
   before relaunch (two engine instances of one app corrupt each other under
   the cooperative scheduler).
4. app_registry RUNNING/PENDING/HISTORY tables + drain_relaunches(): on an
   abnormal (nonzero/signal) group death, tear down survivors, refocus the
   shell, and relaunch — capped at 3 relaunches / 30 s, after which the app is
   left closed (degrade to shell rather than crash-loop).
5. drain hook in sys_wm_event_wait (the shell's pid-1 event pump) — a
   continuously-polled normal syscall context, the safe place to spawn.

Verified: launch app → kill its leader from a sibling thread → group torn down
(6 survivors reaped) → shell refocused → relaunched (attempt 1/3) → fresh
instance boots, panics=0, shell keeps rendering throughout.
A temporary signing-test hook (force-resolve "Demo" every boot + log::error
spam) leaked into pkg::init() when the signing work was committed (0cf5aa8).
Restore the normal best-effort catalog refresh with quiet log::info.
…tored caps, not PID

The capability model (security::Capabilities) existed but was never stored or
enforced — privileged access was gated by hardcoded PID checks ("only pid 1"),
so "capability-secured" was aspirational.

Now each PCB carries a caps set:
- spawn_with_bootstrap grants HOST_MODE_SHELL the full set and launched apps
  (HOST_MODE_APP) none; spawn_thread and fork inherit their creator's caps.
- caps_of(pid) / current_has_caps(req) read the per-PCB set.
- The syscall dispatcher gates each privileged call via required_cap(): a
  caller missing the capability gets EPERM (the kernel idle task, pid 0, is
  exempt). Enforcement is deliberately conservative — only raw networking
  (SYS_NET_* + tcp/dhcp) is gated for now; render/WM/VFS-read/POSIX (everything
  an app uses) is unprivileged and never gated, so this is safe to switch on.
- The PID-0 Cortex admin API now requires CAP_CORTEX (was pid==1).

Verified: x86 + aarch64 compile; boot renders (present=372, crash=0) with zero
spurious denials; the shell (full caps) still does its signed-package
networking; serial confirms pid 1 spawns with the full capability set.
# Conflicts:
#	Cargo.lock
#	README.md
#	kernel/src/app_registry/mod.rs
#	kernel/src/arch/aarch64/apic.rs
#	kernel/src/arch/aarch64/boot.rs
#	kernel/src/arch/aarch64/cpu.rs
#	kernel/src/arch/aarch64/enter_user.rs
#	kernel/src/arch/aarch64/mmu.rs
#	kernel/src/arch/aarch64/mod.rs
#	kernel/src/arch/aarch64/syscall.rs
#	kernel/src/arch/aarch64/timer.rs
#	kernel/src/arch/aarch64/vectors.rs
#	kernel/src/arch/mod.rs
#	kernel/src/cortex/pid0.rs
#	kernel/src/embedder/abi.rs
#	kernel/src/main.rs
#	kernel/src/mm/mod.rs
#	kernel/src/pkg/http.rs
#	kernel/src/pkg/resolver.rs
#	kernel/src/process/dl.rs
#	kernel/src/process/mod.rs
#	kernel/src/syscall/dispatch.rs
#	kernel/src/syscall/handlers/engine.rs
#	kernel/src/syscall/handlers/fd.rs
#	kernel/src/syscall/handlers/futex.rs
#	kernel/src/syscall/mod.rs
#	kernel/src/syscall/poll.rs
#	kernel/src/syscall/posix.rs
#	kernel/src/wm/mod.rs
#	scripts/build-aarch64-shell.sh
#	scripts/build-iso.sh
#	scripts/run-aarch64.sh
#	tools/build-flutter-osx.sh
#	tools/flutter-embedder/src/main.rs
#	tools/flutter-embedder/src/sys.rs
#	tools/pkg-server/Cargo.toml
#	tools/pkg-server/src/main.rs
… dir) + gitignore

These regenerable build outputs / debug traces were tracked and triggered
GitHub's large-file warning. Untrack from HEAD and ignore going forward.
(History blobs remain — a separate coordinated rewrite if ever needed.)
dc37ba4

The merge conflict resolution duplicated try_claim_cpu_for; the dedup was made
in the worktree's working tree (so build verification passed there) but never
staged before the merge commit, so dc37ba4 as committed did not compile. Remove
the duplicate. Verified: x86 + aarch64 kernel both build 0 errors.
Replace the warm-up loading spinner (dark-navy bg + rotating teal dots) with
the OSCortex/DotCorr logo mark: white, centered, scaled to 1/5 of the screen's
shorter side — the same proportion as the Apple boot mark, resolution-
independent on any display. Black background (power-friendly; the mark is
already white).

- fb.rs: draw_boot_splash() now nearest-neighbour blits a 256² 8-bit alpha
  mask (kernel/assets/logo_mask.bin, rasterised from the white logo SVG and
  cropped tight to the glyph) as solid white where coverage >= ~43%.
- compositor: warm-up fill is now black instead of 0x0c1c26.

Shown by the compositor during the Flutter engine JIT warm-up, before the
shell presents its first frame. Verified rendering live under HVF (-kernel):
white mark centered on black, both arches build clean.
…M guidance

build-iso.sh: add SKIP_CORE_APPS guard. The core-app rebuild drives
build-flutter-osx.sh, whose AOT step needs the (currently purged) oscx-engine
Docker image. With SKIP_CORE_APPS=1 the build reuses the app assets already
staged in initramfs/Applications — the apps run JIT off kernel_blob.bin
(arch-independent), so their launcher tiles still render. This lets a
full-render x86 shell ISO build on a plain macOS host (no Docker).

release.yml: fix the release notes. UTM's bundled UEFI firmware does not boot
the Limine ISO, so the UTM-bootable artifact is the aarch64 .kernel via
Linux → Boot from kernel image, with Display = ramfb and CPU Cores = 1.
Documented the exact UTM steps; the ISO is for QEMU/other UEFI VMs + bare metal.
…edding

The x86_64 shell runs JIT off kernel_blob.bin. A leftover AOT snapshot in
system/flutter (notably an arm64 libapp.so staged by a prior aarch64 build)
makes the embedder pick AOT, and a cross-arch snapshot aborts the Dart VM
("snapshot requires arm64 but the VM has x64"). Strip libapp.so/app.aot before
the kernel embeds initramfs so the embedder always uses the arch-independent
JIT blob. Verified: x86 ISO boots, framebuffer + compositor + boot splash
render at 1280x800, engine initializes with no snapshot mismatch.
draw_software_cursor() was gated on the x86-only ps2::PS2_READY flag and read
position/buttons/activity from the ps2 driver — so on aarch64 (virtio-input,
no PS/2) the cursor never drew: hover/click worked but no pointer was visible.

Unify cursor state in wm (CURSOR_X/Y/BUTTONS/SEEN/LAST_ACT_NS), updated in
push_pointer, which BOTH the x86 PS/2 driver and the aarch64 virtio-input driver
already funnel through. draw_software_cursor now reads wm::cursor_* and uses the
arch-neutral rdtsc_ns() clock for the 3s idle auto-hide. Verified: the cursor
renders at the pointer position on aarch64 under HVF (was invisible before);
x86 path unchanged (ps2 still feeds push_pointer).
Make the vsync-baton + cooperative-scheduler path per-engine instead of
hardcoded to the shell (pid 1): embedder_baton_due(pid), set_vsync_baton(pid,
baton) delivers to + wakes the posting engine (sys_engine_vsync_baton_post uses
the caller pid), prefer_embedder_if_baton_due tries the foreground app then the
shell, and the apic/idt timer paths wake whichever engine's baton is due. This
is the correct design for running a second Flutter engine (a launched app)
alongside the shell.

No regression: shell-only still renders (present advances) and apps still
launch. NOTE: this does NOT by itself resolve the post-launch UI freeze — that
is a separate single-core scheduling / embedder-loop issue (a backgrounded
engine busy-spins in schedule_frame and starves the foreground app); tracked
for a follow-up. Committed as the foundation that fix will build on.
…reen)

The framebuffer driver assumed 32bpp XRGB and ignored the firmware's channel
masks. On real UEFI GOP framebuffers (e.g. many Intel Macs report RGB /
red_shift=0) this rendered the desktop with red/blue swapped — a blue-cast
"wrong colors" screen — and on non-32bpp it bailed entirely (FB_READY never set
→ black, no UI; the boot logs you still see are the independent serial mirror).

Now fb::init reads red/green/blue_mask_shift from Limine and logs the full GOP
format (bpp/pitch/model/shifts) to serial for diagnosis. Pixels are repacked
into the firmware's channel order at every real-framebuffer write (set_pixel,
fill_rect, blit_rgba32, swap_buffers, glyph/clear), gated by an XRGB fast-path
so the working case (ARM ramfb, x86 std-vga) is byte-for-byte unchanged — no
regression (ARM present=143, crash=0). Non-32bpp is now logged loudly (still a
TODO to byte-pack). Verified: XRGB = identity, BGR = correct R/B swap.
…"no UI")

x86 "boot logs but no UI" had two stacked causes:
  1. fb.rs ignored the firmware framebuffer channel order (fixed in c710176).
  2. The build staged the AOT-only ('product') x86 engine but the matching
     'product' gen_snapshot is unavailable (purged), so the shell's Dart snapshot
     could never load ("snapshot requires release/product..." mismatch) → the
     engine ran with no Dart code → blank screen.

The x86 path runs the shell via the JIT engine off kernel_blob.bin (no snapshot
needed). build-iso.sh now strips any AOT snapshot to force the JIT path, and the
staged engine must be the JIT/debug engine (contains the Dart kernel compiler).
build-flutter-osx.sh prefers the gen_snapshot that matches the shipped engine
(for the AOT path, when that toolchain is restored).

Verified under OVMF/UEFI: the full shell desktop renders with correct colors
(present climbs to 284, nonblack=18557 — matches the ARM desktop). NOTE: the
JIT engine binary (gitignored, ~91MB) must be staged at
tools/flutter-engine/libflutter_engine.so; CI needs it provisioned (task #6).
Bring up USB HID input on aarch64 so the pointer works out-of-the-box in
UTM/QEMU, where USB is the default pointer device (virtio-input previously
needed manual VM config). Full xHCI bring-up from a -kernel boot with no
firmware to lean on:

- PCIe ECAM config access + on-demand 1 GiB device mapping
  (arch/aarch64/pci.rs, mmu::map_device_1gib).
- Self-assign PCIe BARs: a -kernel boot has no firmware to program them, so
  size them and bump-allocate out of the QEMU virt 32-bit MMIO window.
- Work around a QEMU+HVF vCPU hang: programming/enabling a BAR makes QEMU
  rebuild the guest memory map asynchronously, and a racing timer-ISR MMIO
  access wedges the core. Mask interrupts + settle across the BAR write and the
  memory-decode enable.
- Full enumeration: enable-slot -> address-device -> GET_DESCRIPTOR (detect
  mouse vs keyboard via bInterfaceProtocol) -> SET_CONFIGURATION -> configure
  interrupt-IN endpoint -> SET_PROTOCOL(boot)/SET_IDLE, with an EP0 control
  ring and link TRBs for ring wrap.
- Fix event dispatch: the TRB-type mask was 0x3FF, which spilled into the
  Transfer Event's Endpoint ID field, so EP0/HID transfer events (ep_id != 0)
  were misclassified and dropped. Mask the 6-bit type field. Also fix
  HCSPARAMS1/DBOFF/RTSOFF offsets (were read relative to CAPLENGTH instead of
  the capability base) and set EP0 max-packet-size by bus speed.
- Boot-mouse report -> absolute cursor (clamped to the framebuffer) ->
  wm::push_pointer + push_scroll; poll the runtime on the vsync tick, mirroring
  the x86 APIC ISR.

Verified end-to-end under TCG and HVF: probe -> enumerate (protocol=2 mouse) ->
injected relative movements track the cursor and buttons register.
Replace the kernel-log boot output with a product-grade boot screen (DotCorr
"Doto" look): a dot-matrix OSCORTEX wordmark, an "<elapsed>s cold boot" caption
with an accent dot, a blue progress bar that ramps as boot proceeds, and a
green "cortex::<stage> · <elapsed>s" status line. Drawn by the kernel from the
first frame on every boot path and animated by the compositor during the engine
warm-up; snaps to boot_completed when the shell presents.

- New drivers/bootscreen.rs. fb.rs gains a dot-matrix text renderer, public
  left-aligned text helpers, and enable_fb_logging().
- Framebuffer log mirroring is silenced by default (the serial console keeps
  everything). F2 toggles an on-screen verbose overlay backed by a small
  in-memory log ring in logger.rs; serial COM1 stays the authoritative log.
- The panic handler re-enables framebuffer logging so a failure is never hidden
  behind the boot screen.
- Wired into all boot paths: x86 Limine, aarch64 -kernel (ramfb), aarch64
  Limine.

Verified on aarch64 (TCG): the boot screen renders as designed and the shell
still comes up cleanly afterwards (no regression).
The arm64 Limine ISO crashed/hung in UTM. Three independent fixes:

- usb: skip the xHCI probe on the aarch64 UEFI/Limine ISO build. Reading the
  firmware-assigned xHCI BAR config under QEMU+HVF trips a QEMU host-side
  assert(isv) (hvf.c) that kills the VM — a QEMU+HVF host bug on the edk2 boot
  path, absent on real hardware, under TCG, and on the bare -kernel boot (the
  supported UTM arm64 path, which self-assigns BARs and keeps USB HID). x86
  keeps USB too.

- pci: re-assign firmware-placed 64-bit BARs that land in the high PCIe MMIO
  window at 512 GiB (e.g. 0x80_0000_8000) into the identity-mapped low 32-bit
  window. Such a BAR is outside our 39-bit address space and unreachable —
  touching it faults on real hardware as well, not just QEMU.

- build: guard build-iso-aarch64.sh against cross-arch initramfs contamination.
  The kernel embeds the shared initramfs/ (incl. /init = the aarch64 shell host);
  running the x86 ISO build first overwrites it with x86 binaries, so the arm64
  ISO would package an x86 /init and hang ("Failed to spawn /init: wrong ELF
  machine"). The build now fails loudly instead of shipping a broken ISO.

Verified live in UTM (HVF): the arm64 ISO boots to the full shell. x86 ISO
boots (shell verified headless under TCG).
…-port

# Conflicts:
#	.github/workflows/release.yml
#	kernel/src/arch/aarch64/apic.rs
#	kernel/src/arch/aarch64/boot_limine.rs
#	kernel/src/syscall/poll.rs
#	scripts/build-iso-aarch64.sh
@squirelboy360 squirelboy360 merged commit df27d33 into develop Jun 14, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants