feat: native engine port — both arches boot+render, USB HID, boot screen, pkg pipeline#14
Merged
Merged
Conversation
- PS/2 IntelliMouse 4-byte scroll wiring (kernel) -> EV_SCROLL -> embedder FlutterPointerEvent (signal_kind=kScroll); direction+speed configurable. - Settings screen in the shell: natural-scroll toggle + speed slider, live via config:* messages over the oscortex/shell channel. - Kernel boot spinner drawn on the framebuffer during JIT warm-up (drivers/fb.rs::draw_boot_splash, driven by compositor::tick); clears when Flutter presents its first frame. Plus a 'Launching <app>' overlay in the shell. - Adaptive embedder frame pump: gentle during warm-up, 60fps after input, ~8fps idle (was a 1ms/1000-per-sec flood) -> req:frame ratio 115:1 -> ~7:1. - Revert MAX_FRAMES to 1<<20 (RAM above 4GiB is not yet safely mappable; the cap is load-bearing) with an explanatory comment. - docs/arch.txt: document the real JIT execution model + runtime status.
Build the Flutter engine from source as a first-class OSCortex AOT target instead of shimming a Linux engine. Phases, port surface, build infra, and the hack-removal checklist in docs/native-engine-port.md; direction recorded in docs/arch.txt.
Draw the destination ahead of the work: three clean layers (kernel native ABI / shared AOT engine / self-contained AOT app bundles with own PIDs), the shared- framework property, and the hack-removal it enables. No Linux costume.
…ace map Baseline libflutter_engine.so (377MB, x64, embedder API) builds from source in the container — toolchain proven. Mapped the exact port surface: ~1,200 lines (Dart VM os_oscortex/os_thread_oscortex ~1000, fml message_loop_oscortex+paths, reuse posix) + GN glue. The fml message loop is the critical file — its emulated equivalent is what livelocks rendering today, so the native port also fixes sync.
Capture the validated Phase 0 setup as one-command scripts so the repetitive, heavy engine-build infra is reproducible, not tribal knowledge: - setup-engine-build.sh: idempotent container + depot_tools + pinned gclient sync (encodes the name='.' fix and the flags that work). - build-engine.sh: gn configure + ninja for baseline | oscortex targets. - README.md: the contributor flow, the edit->build incremental loop, and the pitfalls already solved (gclient layout, emulation, disk, prebuilt-dart). Engine checkout stays OUTSIDE the repo (22GB, never committed).
Add 'oscortex' as a --target-os. Since OSCortex has no sysroot/libc yet (runs linux-ABI via emulation), it links against linux but sets a new is_oscortex GN flag to select the OSCortex backend sources later; a true OSCortex toolchain is a later sub-phase. gn configures cleanly (1714 targets; args.gn = target_os linux + is_oscortex true). - engine-port/patches/: the two tracked diffs (gn tool, BUILDCONFIG). - engine-port/apply-port.sh: applies patches + (Phase 2) backend sources into a fresh checkout, idempotent. Temp patch files cleaned from the workspace (no remnants).
This is a public-facing repo; debugging slop and dead-path artifacts don't belong. Removed (all dead, none load-bearing for the working JIT build): - scratch/ (192 files): the entire AOT-deser debugging tree — analyze_*/disasm_*/ assemble_*/check_* scripts, .bin/.log artifacts, and vendored gen_snapshot/ shader binaries. Referenced only by build-iso.sh's dead shell-AOT step. - tools/flutter-engine/libflutter_engine.so.bak: 91MB orphaned 'pristine engine' backup, zero references. - gen_help.txt, test_size, test_size.c, .DS_Store: stray one-offs. - build-iso.sh: excised the dead [0.3/5] shell-AOT block (gen_snapshot -> libapp.so -> patch_libapp) — the shell runs JIT off kernel_blob.bin, not an AOT libapp.so; also dropped libapp.so from REQUIRED_FILES. Native AOT is done properly via the engine port (docs/native-engine-port.md). - .gitignore: ignore /scratch/ and /qemu-pipe.* so they can't return. Deliberately KEPT (still load-bearing for the current JIT build; removed in the port's Phase 4 once the native engine works): engine_patch.py, patch_libapp.py (still used by tools/build-flutter-osx.sh for the apps), the Linux-emulation shim. Verified: build-iso.sh and build-kernel-iso-fast.sh both pass bash -n; the working fast build references neither scratch/ nor libapp.so.
Add the OSCortex fml platform backend and wire is_oscortex selection. Verified:
fml.message_loop_oscortex.o + fml.paths_oscortex.o compile into libfml, the linux
message loop is excluded, and message_loop_impl.cc selects MessageLoopOscortex.
- engine-port/src/flutter/fml/platform/oscortex/: message_loop_oscortex.{cc,h}
(epoll+timerfd loop — the sync-bug-fixer, starts as a clean clone to diverge to
native primitives) + paths_oscortex.cc.
- patches/0003: build_config.h defines FML_OS_OSCORTEX (additive, gated on the
FLUTTER_OSCORTEX define), message_loop_impl.cc selects MessageLoopOscortex first,
fml/BUILD.gn swaps in the oscortex sources (excludes linux) + sets the define.
- apply-port.sh applies 0003 + copies src/.
Note: the backend uses the Linux-ABI calls (epoll/timerfd) that OSCortex natively
implements — OSCortex is a native kernel + own libc that is Linux-ABI-compatible
(like Fuchsia/Starnix), not Linux. These calls are the divergence seam if we ever
move to a fully custom ABI.
…t VM clone - Verified a release/AOT oscortex build configures and gen_snapshot is buildable from our tree -> engine+snapshot version-matched, dissolving the old AOT dead-end (the '1247 base objects' mismatch). AOT is now a build step, not a research problem. - Deferred the Dart VM os_oscortex clone: on path A os_linux.cc already works; cloning it byte-identically is busywork. Add a Dart VM backend only when a primitive must actually diverge.
libflutter_engine.so (377MB) links for the oscortex target and bakes in OUR backend: 17 MessageLoopOscortex refs, 0 MessageLoopLinux. First Flutter engine built from source FOR OSCortex. Refined the gn target (patch 0001): oscortex now configures as a HOST linux-x64 build + is_oscortex flag (identical to the proven baseline, our platform backend selected), instead of an explicit target_os that tripped ANGLE/wayland scope and embedder-constructor mismatches. Runtime still renders software via kSoftware.
…ld) flow Make the from-source build a one-time maintainer task, not something every dev repeats — exactly how Flutter distributes its own engine. - artifact.config: pins ARTIFACT_VERSION + Flutter rev + the R2 base URL. - fetch-engine.sh: CONSUMER flow — downloads the pinned prebuilt engine (libflutter_engine.so + gen_snapshot + icudtl.dat) from Cloudflare R2, verifies sha256, stages it where the OS build expects. Seconds, no checkout. - publish-engine.sh: MAINTAINER flow — packages the built artifacts and uploads to R2 (rclone or aws/S3 to the R2 endpoint). Run after build-engine.sh when the port/version changes; bump ARTIFACT_VERSION. - README: two-tier model (consumers fetch, maintainers build+publish), R2 layout, multi-arch = one tarball per ISA. Scaffolding is ready now; first publish happens once the release/AOT engine lands (Phase 3). Set ARTIFACT_BASE_URL to your R2 URL to activate fetch.
- setup-r2.sh: create the R2 bucket + enable public r2.dev URL + auto-write ARTIFACT_BASE_URL into artifact.config. Idempotent. Needs Cloudflare auth first (npx wrangler login, or CLOUDFLARE_API_TOKEN). - publish-engine.sh: upload via wrangler (npx, no install) as the primary R2 backend, rclone/aws as fallbacks. - README: one-time R2 host setup steps. Bucket creation requires the account owner's Cloudflare auth, so this is the single manual step; everything else is automated.
R2 needed dashboard activation; switched the artifact host to Google Cloud Storage (gs://dotcorr-oscortex-engine, public read, billing active). Verified end-to-end: gcloud upload -> anonymous https://storage.googleapis.com/... fetch. - artifact.config: ARTIFACT_BASE_URL -> GCS public URL + GCS_BUCKET. - publish-engine.sh: gcloud storage cp as the primary backend (wrangler/rclone/aws remain as fallbacks). fetch-engine.sh is unchanged (curl, host-agnostic).
The repo is public, so GitHub Release assets download anonymously — free, no egress fees (vs GCS ~$0.12/GB), no separate cloud account, and gh is already authed. Verified end-to-end: gh release upload -> anonymous https://github.com/DotCorr/oscortex/releases/download/... fetch. - artifact.config: ARTIFACT_BASE_URL -> GitHub releases download URL; GITHUB_REPO. - publish-engine.sh: gh release create/upload as the PRIMARY backend (GCS/R2/S3 remain as documented fallbacks). fetch-engine.sh unchanged (host-agnostic curl). - README: GitHub Releases is the host; no bucket setup needed. - Removed the unused GCS bucket (no remnants).
…erified The release oscortex engine (33MB, vs 377MB debug — no JIT) and a version-matched gen_snapshot (6.5MB) built from one tree. Published the first artifact to GitHub Releases (oscortex-engine-1) and verified the full round trip: publish-engine.sh -> gh release -> fetch-engine.sh downloads+checksums+stages. Engine + gen_snapshot from the same tree are version-matched, which dissolves the old AOT dead-end (the '1247 base objects' mismatch). Fix: publish-engine.sh derives the workspace from the container's /work mount (robust to the workspace dir name) instead of assuming a fixed name.
shell app -> frontend_server -> AOT dill (24MB) -> our version-matched
gen_snapshot -> libapp.so (4.4MB native ELF). Verified a real AOT snapshot: all 4
_kDart*Snapshot{Data,Instructions} present with REAL native instructions (T/text,
not zeroed stubs), ELF64 x86-64. The multi-session 'Snapshot expects N base
objects, provided 0' blocker is dissolved because engine + gen_snapshot are built
from one tree (version-matched). compile-app-aot.sh makes it reproducible.
The native, from-source, AOT-compiled Flutter engine RENDERS the shell on OSCortex: FlutterEngineRunsAOTCompiledDartCode -> libapp.so AOT path -> present_callback 393 frames, ZERO JIT warmup (no kernel_blob, no codegen). The UI comes up immediately, no 60-90s compile. This is the whole point of the port. Fix to get here: the release/AOT engine rejects the JIT-era GC/heap dart-flags (switches.cc:478 disallowed) — pass only the 5 AOT-safe engine args (argc=5) when is_aot, dropping the old --old_gen_heap_size etc. Known follow-up (separate, a regression from this session's IntelliMouse 4-byte scroll wiring): mouse clicks/Y are mangled + pointer events not reaching the engine; to fix next.
The 4-byte scroll-packet mode added this session corrupted pointer data: dy mis-parsed (cursor pinned to the top, y=0) and the flags byte mis-parsed (buttons stuck at 0), so clicks never registered and pointer events stopped reaching the engine. Revert to the proven 3-byte packet mode (working clicks > broken wheel). Verified: cursor Y tracks normally again (y=32 start, not pinned at 0). Scroll to be re-added later with correct 4-byte parsing + resync that doesn't regress clicks.
Input is confirmed working under AOT (122 pointer events reached Flutter, hover + clicks register, button presses detected, dropped_total=0 — the earlier '0' was the broken IntelliMouse 4-byte mode + minimal interaction, fixed by the 3-byte revert). Remove the temporary DIAG logging from the embedder + ipc_display. engine-port/compile-app-aot.sh: per-app AOT compile (frontend_server -> dill -> version-matched gen_snapshot -> libapp.so), no patch — used to give each app its own AOT bundle so the AOT engine can run them.
Previously every launched app reused the shell's /system/flutter/libapp.so (aot_va=0 JIT-era skip), so tapping a tile ran the shell snapshot in the app host and stalled. Resolve the per-app snapshot by registry lookup: build_app_libapp_path(app_id) -> /Applications/<name>.app/libapp.so, dlopen it, and point the AOT snapshot loader at that path. Shell keeps its own path. Each app is AOT-compiled to its own libapp.so and staged under its .app bundle, so a launched host loads its own native snapshot, not the shell's.
pthread_cond_signal/broadcast with no waiter parked is a no-op by contract: the predicate is mutex-protected, so a thread that hasn't yet entered cond_wait observes it under the lock and never waits. The kernel was instead recording such signals in COND_PENDING_SIGNALS and letting the next cond_wait consume one as an immediate (spurious) return. A hot mutator (pid 2) that signals its own monitor with woke=0 then re-waits would consume its own fake pending, return 0, find its predicate still false, and re-wait forever — the cond-pending-consume livelock that froze render at a fixed frame count. Remove the mechanism end to end (consume in cond_wait, posts in cond_signal/cond_broadcast, the state map, the import). cond_wait now relies solely on the seq protocol, which delivers every real signal race-free under cooperative single-core scheduling (the value-check and park cannot interleave with a signaler). No remnants.
sys::dlopen forwards path.len() to the kernel as the path length, so a NUL-terminated byte string makes the kernel read the trailing \0 as part of the filename and the open fails. The per-app and shell AOT paths were passing "...libapp.so\0"; strip the NUL at the dlopen call. The AOT snapshot itself loads via aot_snapshot_load (which maps it executable), so this only silenced a spurious failure path, but it is a real bug and removes the warning.
The Debug-level hot per-syscall traces (epoll_ctl, mprotect, cond-signal, every keypress) each block on a synchronous COM1 UART write AND re-render the framebuffer text console, with interrupts disabled. Measured: ~14000 log lines during a single boot+app-launch. Dropping to Error cut serial volume ~12x (14000 -> 1185 lines), removing tens of seconds of emulated-bare-metal warmup. Raise the one line in logger::init back to Debug for deep tracing.
Root cause of the sporadic render crashes. The user GPR snapshot lives in a PER-CPU scratch (gs:[..]) written by the syscall entry stub and shared by every thread on the core. save_full_user_gprs read it LAZILY at yield time — but the wait loops do sti;hlt and the timer ISR can switch threads mid-handler, so another thread's syscall entry overwrites the per-CPU snapshot before our save runs. We then stored the OTHER thread's callee-saved regs (rbx/rbp/r12-r15) into our context; on resume rbx was garbage. Pinpointed via addr2line/objdump: fml::MessageLoopOscortex::Run() resuming from epoll_wait wrote running_ through this=0x1400 (=1280*4, a Skia row stride leaked from another thread) -> SIGSEGV pid=3. This is the whole 0/16/140/489-frames- across-identical-boots sporadicity. Fix: capture_user_gprs_at_entry() snapshots the GPRs into the thread's own PTABLE slot at dispatch_fast entry, while the per-CPU snapshot is still fresh for this thread (before any handler/yield/interrupt window). A per-CPU 'captured' flag makes later yield-time save_full_user_gprs calls no-ops, so a clobbered shared snapshot can't leak in. Validation: 4/4 headless boots render cleanly (present 98-105) with ZERO engine SIGSEGV, vs the prior sporadic crashes. Render is now reliable.
The frame pump only called FlutterEngineScheduleFrame in the no-event branch, so a hover/click/scroll waited for the event queue to drain plus a wm_event_wait timeout (up to ~16ms) before the repaint was even requested. Request the frame in the same iteration input is received; the engine coalesces redundant requests and Flutter flushes the batched pointer packet at BeginFrame, so the scheduled frame reflects the event. Removes the scheduling-side input latency (most visible on real hardware; under cross-arch QEMU TCG the emulation dominates).
Each submitted frame did up to 5 full 1M-pixel passes: a strided re-pack into a freshly allocated 4MB Vec, a byte-indexed B<->R swap, a full-screen fill_rect clear, blit_rgba32, and swap_buffers. Measured ~65ms median (wildly variable 10-92ms) -> a ~15fps ceiling from the blit alone, which read as a low refresh rate / laggy feedback. - Fuse the strided re-pack into the swap: submit_bgra_impl reads the source stride directly and packs in ONE u32-wise pass (read BGRA as u32, swap bytes 0<->2), eliminating the intermediate buffer + the 4MB/frame allocation. - Skip the full-screen fill_rect clear when a presented surface already covers the screen (the common case: one full-screen Flutter surface). Measured after: blit median 5.4ms, steady 5.0-7.6ms (~12x faster, variance gone). Render verified correct + crash-free; colors unchanged.
next_runnable_pid_locked computed the foreground-exclusive group AFTER the input-target and embedder-baton shortcuts, so a due shell (pid 1) baton could schedule the shell engine even while an app is foreground — two heavy Flutter VMs on one cooperative core, the documented cause of the launched app's crash. Compute fg/exclusive FIRST and gate both shortcuts: suppress the pid-1 baton when an app is exclusive, and only honour the input shortcut for a target in the foreground group. No behavioural change when the shell is foreground (the common case): exclusive=false short-circuits both gates. Closes the baton concurrency hole; full app-launch validation still pending an interactive tile-tap (HMP mouse_button injection does not produce reliable clicks headless).
Replace the generic reply-null catch-all in platform_message_callback with a real channel dispatcher (match on channel name). Every channel the framework reaches for is now routed to a concrete handler that does the platform work or returns a codec-correct typed ack; the final catch-all logs the channel name ([embedder/chan] unbound: <name>) so nothing stays an invisible stub. Bound channels: - flutter/textinput (JSONMethodCodec): setClient/setEditingState/show/hide/ clearClient. Editing state (text + selection) is maintained in the embedder. PS/2 set-1 scancodes are now mapped to characters (shift/caps, backspace, enter, tab, space, arrows, home/end, delete); on a key press with an active text client the stored editing state is mutated and TextInputClient. updateEditingState is pushed back over flutter/textinput. Adds a small inline JSON reader/writer (no_std, no crates) for the editing-state maps. - flutter/mousecursor: parse activateSystemCursor kind and ack (no kernel set-cursor-shape syscall exists yet; logged + acked). - flutter/platform: Clipboard.setData/getData/hasStrings backed by an in-embedder buffer; SystemNavigator.pop; SystemSound/HapticFeedback/ SystemChrome acked. - flutter/navigation, system, accessibility, spellcheck, processtext, menu, contextmenu, scribe, restoration, keyevent, platform_views, isolate, lifecycle: explicit JSON typed-null ack.
Replace the ack-only stubs from the platform-channel contract with actual OS-provided capabilities. OSCortex is the platform under the stock engine, so where Flutter needs a platform service the kernel now implements and binds it. Mouse cursor shape (flutter/mousecursor.activateSystemCursor): - compositor: ACTIVE_CURSOR_SHAPE atomic + vector cursor sprites (arrow, I-beam, hand/link, forbidden, grab, horizontal/vertical resize, hidden). draw_software_cursor dispatches on the active shape; set_cursor_shape() repaints immediately. - new syscall SYS_CURSOR_SHAPE_SET (0x4B2). Embedder maps the Flutter cursor kind string to a CURSOR_SHAPE_* and calls it, so hovering a link shows a hand, a text field shows an I-beam, etc. Semantics / accessibility: - embedder wires update_semantics_callback2 (FlutterProjectArgs off 280) and calls FlutterEngineUpdateSemanticsEnabled(engine, true) after run. The callback receives the FlutterSemanticsUpdate2 tree and stores each node (id, label, rect, flags, actions) in a live embedder structure for a11y / automation consumers. flutter/ accessibility now replies with the correct StandardMessageCodec null (0x00), not JSON. System clipboard (flutter/platform Clipboard.*): - kernel-global clipboard buffer (embedder::clipboard) shared across every app/host, with SYS_CLIPBOARD_SET (0x4B3) / SYS_CLIPBOARD_GET (0x4B4). The embedder routes setData/getData/hasStrings to the kernel, so clipboard survives across apps. SystemNavigator.pop (flutter/platform): - SYS_APP_CLOSE_FOREGROUND (0x4B5): refocuses the shell (pid 1) and wakes it. The app embedder, when it is a launched host, flushes its reply, calls the syscall and exits so focus returns to the shell. SystemSound.play (flutter/platform): - PC-speaker beep driver (drivers::beep) via PIT channel 2 + port 0x61, exposed as SYS_BEEP (0x4B6). click vs alert play distinct short tones. Deliberate no-ops (no such hardware), acked with the correct codec: - HapticFeedback.* (no vibration motor), SystemChrome.* (single full-screen compositor surface, no system UI overlays / orientation).
Brings in the embedder channel dispatcher, text input (flutter/textinput), and real OS-backed capabilities: cursor-shape sprites + SYS_CURSOR_SHAPE_SET, live semantics, kernel-global clipboard, SystemNavigator.pop return-to-shell, and a PC-speaker beep driver. Builds green (embedder + kernel).
…% resume crash eret_to_el0/eret_to_el0_fp reset SP_EL1 then read the img/fp resume arrays that still live on the abandoned cooperative-yield frame. A wait loop that yields with IRQs unmasked leaves them unmasked here; a timer IRQ pushes a TrapFrame over img/fp → corrupted resume image (intermittent data abort in eret_to_el0_fp at the SPSR load with x18=0). Fix: msr daifset, #0xf before the SP reset; the eret restores SPSR_EL0T so EL0 runs with interrupts enabled. 12/12 -kernel boots clean (was ~2/8).
…UEFI ISO in CI Kernel: finish_cond_timedout_return advanced the resume PC by a hardcoded 2 bytes — correct for x86's 2-byte `syscall`, but on aarch64 `svc #0` is 4 bytes, so the parked cond_timedwait waiter resumed at svc+2 (a misaligned PC) → EC=0x22 PC-alignment fault, deterministically ~280 frames into a run (every cond_timedwait timeout was a live grenade). Use the cfg'd SYSCALL_INSN_LEN (4 on aarch64) and set aarch64_ret_in_x0 so ETIMEDOUT actually reaches x0. CI: release.yml now builds the aarch64 UEFI/Limine ISO via build-iso-aarch64.sh (LIMINE_DIR=$HOME/limine) and publishes oscortex-aarch64-<tag>.iso + .sha256 alongside the raw -kernel ELF — a real UTM/VM/bare-metal-bootable ARM image. build-iso-aarch64.sh: LIMINE_DIR is now overridable for CI.
…e (HVF/bare-metal) SP-alignment fault The x86 SysV convention seeds entry SP at stack_top-8 (RSP%16==8, return addr on the stack). AArch64 is the opposite: SP must be 16-byte aligned at ALL times (SCTLR_EL1.SA, hardware-enforced) and the return address lives in x30/LR (already seeded via p.user_lr=thread_return_trampoline_va). TCG doesn't enforce SP alignment so it silently worked; real silicon (HVF / UTM / Raspberry Pi) faults EC=0x26 the instant a spawned engine thread touches its stack — which is why it rendered headless but never on real hardware. cfg-split spawn_thread and spawn_with_bootstrap to align down to 16 on aarch64; x86 keeps the -8. Verified under HVF (cpu=host, real Apple Silicon): engine now spawns all worker threads with no SP fault and reaches the render event loop.
…PI27) — renders on real hardware The kernel ticked the EL1 PHYSICAL timer (CNTP_CTL_EL0, PPI 30). On QEMU -M virt TCG that interrupt is delivered, so it worked headless. But OSCortex runs at EL1 and under a hypervisor — Apple's HVF (what UTM uses), KVM, Xen — EL2 owns the physical timer: EL1 CNTP accesses are trapped and PPI 30 is NOT delivered to the guest. So under HVF the tick never fired → no vsync baton → the engine reached its event loop, scheduled frames, and stalled with present=0 (nothing ever rendered on real silicon, only in TCG emulation). Switch to the architected EL1-guest VIRTUAL timer (CNTV_CTL_EL0 / CNTV_TVAL_EL0 / CNTVCT_EL0, PPI 27). It is delivered under HVF, on bare metal (CNTVOFF=0 → virtual time == physical time), AND on TCG virt — strictly more portable. vsync_due already measured cadence against CNTVCT, so this is now consistent. Verified under HVF (cpu=host, real Apple Silicon): present_callback reaches 2063 frames, crash=0 — the full Flutter shell renders on real hardware.
…ld + unblocks the package pipeline The on-demand package syscalls (SYS_PKG_RESOLVE/CATALOG/SET_SERVER/EVICT) were assigned 0x390-0x393 in embedder/abi.rs — the SAME numbers as the Phase 53-55 sched_yield/get_cpu_time/fork/kill_signal syscalls. In dispatch.rs the pkg match arms precede the literal 0x390-0x393 arms, so 0x390 dispatched to pkg_resolve and sched_yield was DEAD CODE (4 unreachable-pattern warnings). Every embedder yield silently called pkg_resolve(garbage); rendering survived only because the yield's kernel round-trip happens regardless of handler. Move the pkg syscalls to a free block (0x4C0-0x4C3). Now 0x390 cleanly dispatches to sched_yield and the pkg numbers are distinct + reachable — the prerequisite for wiring the (already-built: HTTP+SHA256+LRU) package pipeline through to the shell. x86 kernel compiles clean, 0 unreachable-pattern warnings (was 4).
The kernel pipeline (HTTP/1.1 over smoltcp → SHA-256 verify → LRU cache → install) and the Dart shell calls (pkg_catalog / pkg_resolve:$name / pkg_set_server:$ip:$port in shell_service.dart) both existed, but the embedder never routed them — the messages fell through unhandled, so the whole on-demand-package feature was dead at the seam. sys.rs: SYS_PKG_RESOLVE/CATALOG/SET_SERVER consts (0x4C0-0x4C2, the post- collision numbers from b4dcdb5) + syscall wrappers. main.rs: three oscortex/shell handlers next to install:/uninstall: — pkg_catalog → {"packages":[{name,version,size}…]} (parses the kernel's 128-byte PkgManifest records: name[0..64], version[64..80], size_bytes u32 @112) pkg_resolve:name → {"app_id":N} (kernel runs fetch→verify→cache→install) pkg_set_server:ip:port → {"ok":…} (dotted-IP parser → packed BE u32) Verified: x86 ISO renders (present=206, crash=0) — no regression on the render-critical host binary.
…e first time pkg/http.rs waited for an outbound connection by zero-byte tcp_read probing, but tcp_read returns EAGAIN whenever can_recv() is false — which is exactly the state of an ESTABLISHED socket whose peer hasn't sent anything (an HTTP server says nothing until it receives the request). The probe could never succeed, so EVERY kernel HTTP fetch failed with Connect since the code was written — the on-demand package pipeline silently booted into "offline mode" on every run ([pkg] catalog fetch failed: Connect, invisible at the Error log level). Fix: net/tcp.rs gains tcp_is_established(fd) — may_send() is the correct handshake-complete signal, with is_open()==false reported as ECONNREFUSED for refused/reset connects. http.rs polls that instead (240×50k spins ≈ wider window for TCG). Verified live: boot with virtio-net + pkg-server on host :8080 → serial "[pkg] parsed catalog: 2 packages / catalog refreshed" and the server access log shows the guest's GET /catalog.bin → 200 (260 bytes). Combined with the embedder seam (463e689) and the syscall renumber (b4dcdb5), the on-demand package pipeline is now reachable end-to-end: boot → HTTP catalog fetch → shell sees remote packages → pkg_resolve streams + SHA-256-verifies + caches.
…ch goes from broken to working Streaming a 5.4 MB .osx bundle through the kernel's HTTP client failed or crawled at a constant ~28 KB/s regardless of buffer sizes. Four stacked causes, each found by test iteration: 1. tcp.rs: smoltcp's now() was the COMPOSITOR FRAME COUNTER ×17 ms. Whenever frames aren't presenting (early boot, headless), smoltcp time crawled ~15× slower than reality, so ACK-delay/retransmit/window timers stretched 15× and paced every bulk transfer at ~28 KB/s (a 5.4 MB fetch took a constant 193 s and starved the reader into timeouts). Use the rdtsc-based monotonic_ns() (the same source timerfds use). syscall::poll made pub(crate) for access. 2. tcp.rs: tcp_connect derived the local port from the SLOT index, so two back-to-back connections to the same server reused the identical 4-tuple — the peer's TIME_WAIT swallowed the second SYN (catalog fetch worked, the immediately-following bundle fetch always died). Monotonic ephemeral-port counter (49152 + n % 16384). 3. http.rs: the read loop stopped after 300 consecutive EAGAINs (~no more data heuristic), silently truncating long transfers; parse_response then min()'d the short body and the SHA-256 check failed confusingly downstream. Content-Length-aware reading (don't give up while bytes are outstanding) + short-body now errors Malformed instead of returning clipped bytes. 4. tcp.rs/virtio_net.rs: 4 KiB socket buffers (= 4 KiB TCP window) and only 8 posted RX descriptors (~12 KB in flight) forced stop-and-wait behaviour and ring overflows. 64 KiB buffers, 64 RX slots. Verified end-to-end: boot → catalog fetch → resolve streams the 5.4 MB Demo bundle over virtio-net (65 s under TCG, was 193 s + failing), SHA-256 verifies, LRU-caches, installs (app_id assigned). The on-demand package pipeline — fetch → verify → cache → install — now works from a cold boot.
… dead foreground group no longer wedges the scheduler Makes the "one crashes, the kernel recovers it in milliseconds" promise real. Before: a launched app that faulted was reaped to a zombie and focus returned to the shell, but the app was never relaunched — and a subtler bug, a dead foreground group could WEDGE the scheduler (foreground-exclusive filtering kept skipping every other process, including the shell, because the dead group was still 'focused'). Five parts: 1. process::next_runnable_pid_locked — if the foreground group's leader is a zombie/dead, exclusivity lapses so the shell (and recovery) get the CPU again instead of the whole system spinning on a dead group. 2. process::exit → app_registry::note_thread_exit(leader, pid, code), resolved under the PTABLE lock then dispatched lock-free (try-lock, ISR-safe). 3. process::kill_group(leader) — tears down a crashed app's surviving threads before relaunch (two engine instances of one app corrupt each other under the cooperative scheduler). 4. app_registry RUNNING/PENDING/HISTORY tables + drain_relaunches(): on an abnormal (nonzero/signal) group death, tear down survivors, refocus the shell, and relaunch — capped at 3 relaunches / 30 s, after which the app is left closed (degrade to shell rather than crash-loop). 5. drain hook in sys_wm_event_wait (the shell's pid-1 event pump) — a continuously-polled normal syscall context, the safe place to spawn. Verified: launch app → kill its leader from a sibling thread → group torn down (6 survivors reaped) → shell refocused → relaunched (attempt 1/3) → fresh instance boots, panics=0, shell keeps rendering throughout.
…d add UTM bring-up guide
A temporary signing-test hook (force-resolve "Demo" every boot + log::error spam) leaked into pkg::init() when the signing work was committed (0cf5aa8). Restore the normal best-effort catalog refresh with quiet log::info.
…tored caps, not PID
The capability model (security::Capabilities) existed but was never stored or
enforced — privileged access was gated by hardcoded PID checks ("only pid 1"),
so "capability-secured" was aspirational.
Now each PCB carries a caps set:
- spawn_with_bootstrap grants HOST_MODE_SHELL the full set and launched apps
(HOST_MODE_APP) none; spawn_thread and fork inherit their creator's caps.
- caps_of(pid) / current_has_caps(req) read the per-PCB set.
- The syscall dispatcher gates each privileged call via required_cap(): a
caller missing the capability gets EPERM (the kernel idle task, pid 0, is
exempt). Enforcement is deliberately conservative — only raw networking
(SYS_NET_* + tcp/dhcp) is gated for now; render/WM/VFS-read/POSIX (everything
an app uses) is unprivileged and never gated, so this is safe to switch on.
- The PID-0 Cortex admin API now requires CAP_CORTEX (was pid==1).
Verified: x86 + aarch64 compile; boot renders (present=372, crash=0) with zero
spurious denials; the shell (full caps) still does its signed-package
networking; serial confirms pid 1 spawns with the full capability set.
# Conflicts: # Cargo.lock # README.md # kernel/src/app_registry/mod.rs # kernel/src/arch/aarch64/apic.rs # kernel/src/arch/aarch64/boot.rs # kernel/src/arch/aarch64/cpu.rs # kernel/src/arch/aarch64/enter_user.rs # kernel/src/arch/aarch64/mmu.rs # kernel/src/arch/aarch64/mod.rs # kernel/src/arch/aarch64/syscall.rs # kernel/src/arch/aarch64/timer.rs # kernel/src/arch/aarch64/vectors.rs # kernel/src/arch/mod.rs # kernel/src/cortex/pid0.rs # kernel/src/embedder/abi.rs # kernel/src/main.rs # kernel/src/mm/mod.rs # kernel/src/pkg/http.rs # kernel/src/pkg/resolver.rs # kernel/src/process/dl.rs # kernel/src/process/mod.rs # kernel/src/syscall/dispatch.rs # kernel/src/syscall/handlers/engine.rs # kernel/src/syscall/handlers/fd.rs # kernel/src/syscall/handlers/futex.rs # kernel/src/syscall/mod.rs # kernel/src/syscall/poll.rs # kernel/src/syscall/posix.rs # kernel/src/wm/mod.rs # scripts/build-aarch64-shell.sh # scripts/build-iso.sh # scripts/run-aarch64.sh # tools/build-flutter-osx.sh # tools/flutter-embedder/src/main.rs # tools/flutter-embedder/src/sys.rs # tools/pkg-server/Cargo.toml # tools/pkg-server/src/main.rs
… dir) + gitignore These regenerable build outputs / debug traces were tracked and triggered GitHub's large-file warning. Untrack from HEAD and ignore going forward. (History blobs remain — a separate coordinated rewrite if ever needed.)
dc37ba4 The merge conflict resolution duplicated try_claim_cpu_for; the dedup was made in the worktree's working tree (so build verification passed there) but never staged before the merge commit, so dc37ba4 as committed did not compile. Remove the duplicate. Verified: x86 + aarch64 kernel both build 0 errors.
Replace the warm-up loading spinner (dark-navy bg + rotating teal dots) with the OSCortex/DotCorr logo mark: white, centered, scaled to 1/5 of the screen's shorter side — the same proportion as the Apple boot mark, resolution- independent on any display. Black background (power-friendly; the mark is already white). - fb.rs: draw_boot_splash() now nearest-neighbour blits a 256² 8-bit alpha mask (kernel/assets/logo_mask.bin, rasterised from the white logo SVG and cropped tight to the glyph) as solid white where coverage >= ~43%. - compositor: warm-up fill is now black instead of 0x0c1c26. Shown by the compositor during the Flutter engine JIT warm-up, before the shell presents its first frame. Verified rendering live under HVF (-kernel): white mark centered on black, both arches build clean.
…M guidance build-iso.sh: add SKIP_CORE_APPS guard. The core-app rebuild drives build-flutter-osx.sh, whose AOT step needs the (currently purged) oscx-engine Docker image. With SKIP_CORE_APPS=1 the build reuses the app assets already staged in initramfs/Applications — the apps run JIT off kernel_blob.bin (arch-independent), so their launcher tiles still render. This lets a full-render x86 shell ISO build on a plain macOS host (no Docker). release.yml: fix the release notes. UTM's bundled UEFI firmware does not boot the Limine ISO, so the UTM-bootable artifact is the aarch64 .kernel via Linux → Boot from kernel image, with Display = ramfb and CPU Cores = 1. Documented the exact UTM steps; the ISO is for QEMU/other UEFI VMs + bare metal.
…edding
The x86_64 shell runs JIT off kernel_blob.bin. A leftover AOT snapshot in
system/flutter (notably an arm64 libapp.so staged by a prior aarch64 build)
makes the embedder pick AOT, and a cross-arch snapshot aborts the Dart VM
("snapshot requires arm64 but the VM has x64"). Strip libapp.so/app.aot before
the kernel embeds initramfs so the embedder always uses the arch-independent
JIT blob. Verified: x86 ISO boots, framebuffer + compositor + boot splash
render at 1280x800, engine initializes with no snapshot mismatch.
draw_software_cursor() was gated on the x86-only ps2::PS2_READY flag and read position/buttons/activity from the ps2 driver — so on aarch64 (virtio-input, no PS/2) the cursor never drew: hover/click worked but no pointer was visible. Unify cursor state in wm (CURSOR_X/Y/BUTTONS/SEEN/LAST_ACT_NS), updated in push_pointer, which BOTH the x86 PS/2 driver and the aarch64 virtio-input driver already funnel through. draw_software_cursor now reads wm::cursor_* and uses the arch-neutral rdtsc_ns() clock for the 3s idle auto-hide. Verified: the cursor renders at the pointer position on aarch64 under HVF (was invisible before); x86 path unchanged (ps2 still feeds push_pointer).
Make the vsync-baton + cooperative-scheduler path per-engine instead of hardcoded to the shell (pid 1): embedder_baton_due(pid), set_vsync_baton(pid, baton) delivers to + wakes the posting engine (sys_engine_vsync_baton_post uses the caller pid), prefer_embedder_if_baton_due tries the foreground app then the shell, and the apic/idt timer paths wake whichever engine's baton is due. This is the correct design for running a second Flutter engine (a launched app) alongside the shell. No regression: shell-only still renders (present advances) and apps still launch. NOTE: this does NOT by itself resolve the post-launch UI freeze — that is a separate single-core scheduling / embedder-loop issue (a backgrounded engine busy-spins in schedule_frame and starves the foreground app); tracked for a follow-up. Committed as the foundation that fix will build on.
…reen) The framebuffer driver assumed 32bpp XRGB and ignored the firmware's channel masks. On real UEFI GOP framebuffers (e.g. many Intel Macs report RGB / red_shift=0) this rendered the desktop with red/blue swapped — a blue-cast "wrong colors" screen — and on non-32bpp it bailed entirely (FB_READY never set → black, no UI; the boot logs you still see are the independent serial mirror). Now fb::init reads red/green/blue_mask_shift from Limine and logs the full GOP format (bpp/pitch/model/shifts) to serial for diagnosis. Pixels are repacked into the firmware's channel order at every real-framebuffer write (set_pixel, fill_rect, blit_rgba32, swap_buffers, glyph/clear), gated by an XRGB fast-path so the working case (ARM ramfb, x86 std-vga) is byte-for-byte unchanged — no regression (ARM present=143, crash=0). Non-32bpp is now logged loudly (still a TODO to byte-pack). Verified: XRGB = identity, BGR = correct R/B swap.
…"no UI") x86 "boot logs but no UI" had two stacked causes: 1. fb.rs ignored the firmware framebuffer channel order (fixed in c710176). 2. The build staged the AOT-only ('product') x86 engine but the matching 'product' gen_snapshot is unavailable (purged), so the shell's Dart snapshot could never load ("snapshot requires release/product..." mismatch) → the engine ran with no Dart code → blank screen. The x86 path runs the shell via the JIT engine off kernel_blob.bin (no snapshot needed). build-iso.sh now strips any AOT snapshot to force the JIT path, and the staged engine must be the JIT/debug engine (contains the Dart kernel compiler). build-flutter-osx.sh prefers the gen_snapshot that matches the shipped engine (for the AOT path, when that toolchain is restored). Verified under OVMF/UEFI: the full shell desktop renders with correct colors (present climbs to 284, nonblack=18557 — matches the ARM desktop). NOTE: the JIT engine binary (gitignored, ~91MB) must be staged at tools/flutter-engine/libflutter_engine.so; CI needs it provisioned (task #6).
Bring up USB HID input on aarch64 so the pointer works out-of-the-box in UTM/QEMU, where USB is the default pointer device (virtio-input previously needed manual VM config). Full xHCI bring-up from a -kernel boot with no firmware to lean on: - PCIe ECAM config access + on-demand 1 GiB device mapping (arch/aarch64/pci.rs, mmu::map_device_1gib). - Self-assign PCIe BARs: a -kernel boot has no firmware to program them, so size them and bump-allocate out of the QEMU virt 32-bit MMIO window. - Work around a QEMU+HVF vCPU hang: programming/enabling a BAR makes QEMU rebuild the guest memory map asynchronously, and a racing timer-ISR MMIO access wedges the core. Mask interrupts + settle across the BAR write and the memory-decode enable. - Full enumeration: enable-slot -> address-device -> GET_DESCRIPTOR (detect mouse vs keyboard via bInterfaceProtocol) -> SET_CONFIGURATION -> configure interrupt-IN endpoint -> SET_PROTOCOL(boot)/SET_IDLE, with an EP0 control ring and link TRBs for ring wrap. - Fix event dispatch: the TRB-type mask was 0x3FF, which spilled into the Transfer Event's Endpoint ID field, so EP0/HID transfer events (ep_id != 0) were misclassified and dropped. Mask the 6-bit type field. Also fix HCSPARAMS1/DBOFF/RTSOFF offsets (were read relative to CAPLENGTH instead of the capability base) and set EP0 max-packet-size by bus speed. - Boot-mouse report -> absolute cursor (clamped to the framebuffer) -> wm::push_pointer + push_scroll; poll the runtime on the vsync tick, mirroring the x86 APIC ISR. Verified end-to-end under TCG and HVF: probe -> enumerate (protocol=2 mouse) -> injected relative movements track the cursor and buttons register.
Replace the kernel-log boot output with a product-grade boot screen (DotCorr "Doto" look): a dot-matrix OSCORTEX wordmark, an "<elapsed>s cold boot" caption with an accent dot, a blue progress bar that ramps as boot proceeds, and a green "cortex::<stage> · <elapsed>s" status line. Drawn by the kernel from the first frame on every boot path and animated by the compositor during the engine warm-up; snaps to boot_completed when the shell presents. - New drivers/bootscreen.rs. fb.rs gains a dot-matrix text renderer, public left-aligned text helpers, and enable_fb_logging(). - Framebuffer log mirroring is silenced by default (the serial console keeps everything). F2 toggles an on-screen verbose overlay backed by a small in-memory log ring in logger.rs; serial COM1 stays the authoritative log. - The panic handler re-enables framebuffer logging so a failure is never hidden behind the boot screen. - Wired into all boot paths: x86 Limine, aarch64 -kernel (ramfb), aarch64 Limine. Verified on aarch64 (TCG): the boot screen renders as designed and the shell still comes up cleanly afterwards (no regression).
The arm64 Limine ISO crashed/hung in UTM. Three independent fixes:
- usb: skip the xHCI probe on the aarch64 UEFI/Limine ISO build. Reading the
firmware-assigned xHCI BAR config under QEMU+HVF trips a QEMU host-side
assert(isv) (hvf.c) that kills the VM — a QEMU+HVF host bug on the edk2 boot
path, absent on real hardware, under TCG, and on the bare -kernel boot (the
supported UTM arm64 path, which self-assigns BARs and keeps USB HID). x86
keeps USB too.
- pci: re-assign firmware-placed 64-bit BARs that land in the high PCIe MMIO
window at 512 GiB (e.g. 0x80_0000_8000) into the identity-mapped low 32-bit
window. Such a BAR is outside our 39-bit address space and unreachable —
touching it faults on real hardware as well, not just QEMU.
- build: guard build-iso-aarch64.sh against cross-arch initramfs contamination.
The kernel embeds the shared initramfs/ (incl. /init = the aarch64 shell host);
running the x86 ISO build first overwrites it with x86 binaries, so the arm64
ISO would package an x86 /init and hang ("Failed to spawn /init: wrong ELF
machine"). The build now fails loudly instead of shipping a broken ISO.
Verified live in UTM (HVF): the arm64 ISO boots to the full shell. x86 ISO
boots (shell verified headless under TCG).
…-port # Conflicts: # .github/workflows/release.yml # kernel/src/arch/aarch64/apic.rs # kernel/src/arch/aarch64/boot_limine.rs # kernel/src/syscall/poll.rs # scripts/build-iso-aarch64.sh
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Merges the native-engine-port line into develop. OSCortex builds from one source
tree and boots + renders the full Flutter shell on both architectures.
Highlights
.kernel+ Limine ISO) boot and render the shell..kernel), and a dot-matrix boot screen (replaces log spam; F2 verbose).This session
initramfs/initcontamination (build guard added) + gated USB off the ISO build (QEMU+HVFassert(isv)host bug on the edk2 path) + re-assign firmware's 512 GiB BAR into mappable space.Verified
.kernel: boot screen + USB mouse + shell (HVF).Known issues (tracked)
assert(isv)); USB works on.kernel/real HW. USB is gated off the ISO.initramfs/(guard now fails loudly on contamination).