Skip to content

ci: multi-platform CI bootstrap (Linux green, macOS/Windows non-blocking)#7

Open
estebanzimanyi wants to merge 64 commits into
MobilityDB:mainfrom
estebanzimanyi:fix/license-main-java
Open

ci: multi-platform CI bootstrap (Linux green, macOS/Windows non-blocking)#7
estebanzimanyi wants to merge 64 commits into
MobilityDB:mainfrom
estebanzimanyi:fix/license-main-java

Conversation

@estebanzimanyi
Copy link
Copy Markdown
Member

@estebanzimanyi estebanzimanyi commented May 8, 2026

👀 Reviewers: tier ranking, dependency chains and the standards checklist live in doc/contributing/reviewer-guide.md (lands with PR #8).

Summary

This PR wires up the three-platform Maven CI workflow and brings it to a stable, mergeable state:

  • Linux (Java 21 / Spark 3.5): fully green — 51 unit tests pass, fat jar built and uploaded as an artifact.
  • macOS (Java 21 / Spark 3.5): continue-on-error: true — non-blocking. JNR-FFI cannot generate a proxy class for JMEOS's 1683-method functions interface on Java 21: the generated <clinit>()V exceeds the JVM 64 KB bytecode limit (MethodTooLargeException). libmeos.dylib loads correctly (verified via Python ctypes); the failure is purely in JNR-FFI proxy generation. Fix requires JMEOS to split its functions interface (see discussion below).
  • Windows (Java 21 / Spark 3.5): continue-on-error: true — same upstream JMEOS issue as macOS. The Windows job now builds libmeos.dll from source (via MSYS2 UCRT64 + CMake/Ninja), compiles MobilitySpark, and reaches the unit-test stage before hitting the same MethodTooLargeException.

Key fixes on this branch

Area What was fixed
macOS — libmeos.dylib Build from source via Homebrew deps + CMake; ad-hoc codesign for hardened-runtime JVM; LD_LIBRARY_PATH + DYLD_LIBRARY_PATH set for JarLibraryLoader
macOS — JMEOS symbol stubs tfloat_avg_value and geog_from_binary stubs injected before build for JMEOS-1.4 compatibility
Windows — libmeos.dll build CMake/Ninja build via estebanzimanyi/MobilityDB:meos-windows-bootstrap branch (adds -DMEOS_TZDATA_DIR support); SIZEOF_LONG_LONG alias added to pg_config.h so pg_bitutils.h compiles on GCC/LLP64
Windows — Maven PATH MSYS2 PATH (POSIX-style) must not overwrite the Windows PATH (which contains Maven); fix uses MEOS_DLL_DIR=$(cygpath -w …) + explicit prepend in PowerShell steps
Windows — JarLibraryLoader JarLibraryLoader in CI mode requires LD_LIBRARY_PATH; set to the Windows-native meos-install\bin path so JNR-FFI's search() locates libmeos.dll; UCRT64_BIN added to PATH for transitive runtime DLLs (libgeos, libproj, libjson-c, libgsl)
Both non-Linux — non-blocking continue-on-error: true with accurate attribution comment; the root cause is architectural (upstream JMEOS), not a CI configuration issue

Upstream issue — JMEOS functions interface

JMEOS's functions interface has 1683 method declarations. JNR-FFI's ASM code generator places all dispatch initializers in a single <clinit>()V; at ~50 bytes per method this produces ~80–130 KB of bytecode, exceeding the JVM's hard 64 KB method limit. The JDK Proxy.newProxyInstance fallback (jnr.ffi.asm.enabled=false) hits the same limit. The fix requires JMEOS to split functions into sub-interfaces of ≤ ~400 methods each (e.g. one per MEOS module: temporal, geo, span, npoint, cbuffer). Linux x86_64 is unaffected with the current JNR-FFI 2.2.17.

Test plan

  • CI passes on Linux (green in this PR)
  • Confirm macOS and Windows are non-blocking (expected failure with clear attribution)
  • Fat jar artifact (mobilityspark-spark.jar) downloads and contains MEOS + JNR-FFI

Luis Alfredo Leon Villapun and others added 30 commits August 7, 2023 12:05
…test count

tgeogpoint_in() writes "got NULL for SRID (4326)" to native stderr when the
spatial reference system CSV is not registered, corrupting the surefire channel
and crashing the forked JVM. tgeogpointFromBinary uses the same fromBinaryImpl
as tgeompointFromBinary (already tested), so no coverage is lost. Null safety
for tgeogpointFromBinary is still verified in fromBinary_null_returns_null.

README test count updated: 50 (23+16+11).
…tic unit tests

tgeogpoint_in() writes "got NULL for SRID (4326)" to native stderr when
meos_set_spatial_ref_sys_csv() has not been called, crashing the surefire
forked JVM. The previous workaround (dropping the tgeogpoint round-trip test)
was reverted. The correct fix is to load the bundled spatial_ref_sys.csv from
the test classpath in @BeforeAll, mirroring MobilitySparkSession.registerSpatialRefSys().

tgeogpointFromBinary_round_trips() is now fully verified on all platforms
including CI. Test count restored to 51 (23+17+11). README updated to match.
Patch utils.JarLibraryLoader to add macOS (libmeos.dylib) and fix Windows
(libmeos.dll) native library loading in addition to the existing Linux path.
The CI branch now also checks DYLD_LIBRARY_PATH so macOS GitHub Actions jobs
can set that env var after building MEOS from source.

CI workflow (maven.yml) gains two new jobs:
- macos: builds libmeos.dylib from MobilityDB source via Homebrew deps, sets
  DYLD_LIBRARY_PATH, and runs the full 57-test suite.
- windows: MSYS2/UCRT64 bootstrap; marked continue-on-error while the MEOS
  Windows standalone build stabilises.

README updated with per-platform setup instructions (§2.2–2.4).

All 57 Linux tests remain green.
Install mingw-w64-ucrt-x86_64-tzdata in the MSYS2 UCRT64 environment
and resolve the IANA timezone data directory to a Windows-native path
(cygpath -m $MSYSTEM_PREFIX/share/zoneinfo).  Inject SYSTEMTZDIR into
the MEOS cmake build via CMAKE_C_FLAGS as a bridge until MobilityDB
issue #513 (meos-windows-bootstrap) merges to master.

Also removes the per-step continue-on-error flags (the copy of
libmeos.dll is no longer needed since cmake install puts it in bin/).
Job remains non-blocking until CI confirms green end-to-end.
On Apple Silicon, Homebrew installs libraries to /opt/homebrew/lib, not
/usr/local/lib. libmeos.dylib's dependencies (libgeos, libproj, libgsl,
libjson-c) are in that prefix, so the dynamic linker could not find them
even though libmeos.dylib itself was installed to /usr/local/lib.

Set DYLD_LIBRARY_PATH=/usr/local/lib:$(brew --prefix)/lib so both the
library itself and its transitive dependencies are on the search path.
jnr-ffi 2.2.17 fixes MethodTooLargeException on ARM64 macOS with Java 21
when the MEOS functions interface exceeds the JVM 64 KB class-initializer
limit that older JNR-FFI versions triggered via JDK dynamic proxy generation.

Windows CI: replace CMAKE_C_FLAGS quoting workaround with a direct
-DMEOS_TZDATA_DIR cmake variable, which the meos-windows-bootstrap branch
of MobilityDB supports cleanly.  Switch the MobilityDB checkout to
estebanzimanyi/MobilityDB@meos-windows-bootstrap until MobilityDB #513
merges to upstream master.
…tall time

JarLibraryLoader passes DYLD_LIBRARY_PATH as a single string to JNR-FFI's
.search(), which does new File(path, "libmeos.dylib"). A colon-separated
value like "/usr/local/lib:/opt/homebrew/lib" is treated as one directory
name, so the file lookup fails.

Root cause of dependency failures: cmake strips build RPATH on install by
default, so libmeos.dylib has no RPATH pointing to Homebrew's lib directory
(/opt/homebrew/lib on Apple Silicon). The dynamic linker cannot find libgeos,
libproj, libgsl, and libjson-c when loading the installed dylib.

Fix: add -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=ON so cmake embeds the actual
link-time library paths in the installed dylib's RPATH. Revert DYLD_LIBRARY_PATH
to a plain single directory so JNR-FFI's file search resolves correctly.
Adds otool -L, RPATH inspection, dependency existence check, and
python3 ctypes load test before the Maven test run so the actual
dlopen error is visible in CI logs. To be removed once macOS loading
is green.
Multiline python3 -c "..." inside a YAML block scalar fails when the
Python code has less indentation than the block level — YAML ends the
scalar early. Use a single-line python3 call instead.
…build

JMEOS-1.4's MeosLibrary declares geog_from_binary and tfloat_avg_value
as non-optional symbols. The current MEOS master is missing both:
- tfloat_avg_value was renamed to tnumber_avg_value
- geog_from_binary is declared in meos_geo.h but never implemented standalone

JNR-FFI fails the entire library load (not just individual methods) when
non-optional symbols are absent, which triggers createErrorProxy and the
secondary MethodTooLargeException from JDK's dynamic proxy generator.

Fix: append two backward-compat stubs to the relevant MEOS source files
before the cmake build so both symbols are exported in libmeos.dylib.
… ARM64

JMEOS-1.4 exposes 1683 native methods via MeosLibrary. JNR-FFI's ASM
bytecode generator packs all method stubs into a single <clinit>()V,
which exceeds the JVM's 64KB limit on Apple Silicon (ARM64 stubs are
larger than x86_64). Result: createErrorProxy fires even when
libmeos.dylib loads successfully.

Pass -Djnr.ffi.asm.enabled=false to surefire so JNR-FFI falls back to
reflection-based stubs, which have no bytecode-size constraint.

Also add nm symbol-export check to the diagnostic step to confirm
tfloat_avg_value and geog_from_binary are exported from the built dylib.
The previous approach used -DargLine on the mvn command line, which
completely replaces <argLine> in pom.xml and silently drops all the
--add-opens flags needed by Spark internals.

Move jnr.ffi.asm.enabled=false into <systemPropertyVariables> instead,
which is independent of <argLine>.  Both the JVM opens and the JNR-FFI
reflection mode are now active on all platforms.

Also gate the macOS libmeos.dylib diagnostic step on if: failure() so
it does not add noise to every green run.
JarLibraryLoader reads LD_LIBRARY_PATH first in CI mode (GITHUB_WORKFLOW
set), then falls back to DYLD_LIBRARY_PATH. On macOS, the JVM's hardened
runtime strips DYLD_* environment variables, making the fallback invisible
to System.getenv(). As a result, libraryPath was null, JNR-FFI searched
only the default dyld paths, failed to find libmeos.dylib, and fell back
to createErrorProxy — which hit the JVM 64KB method limit for the 1683-
method MeosLibrary interface.

Fix: export LD_LIBRARY_PATH=/usr/local/lib alongside DYLD_LIBRARY_PATH on
macOS so JarLibraryLoader's CI-mode path is populated regardless of which
env var the JVM strips.

Revert jnr.ffi.asm.enabled=false from pom.xml: reflection mode also hits
the 64KB limit (for the actual load proxy, not just the error proxy), so
it breaks Linux CI which was green.
Two changes:
1. After cmake install, ad-hoc codesign the dylib so the JVM hardened
   runtime's library validation accepts it.  Unsigned CMake-built dylibs
   can be rejected by processes that require library validation.
2. Add a diagnostic step (after compile so jnr-ffi is in .m2) that checks
   JVM entitlements, libmeos signature, and ctypes load with RTLD_LOCAL mode.
   This will tell us definitively whether library validation is the root cause.
JFFI's native library extraction fails in surefire-forked JVMs on macOS
ARM64 (Apple Silicon), causing UnsatisfiedLinkError → createErrorProxy →
MethodTooLargeException for the 1683-method MeosLibrary interface.

Run tests in the Maven JVM itself (forkCount=0) to avoid the fork.
MEOS global state is safe: meos_initialize is idempotent, meos_finalize
is intentionally absent from teardown per the no-finalize-in-tests policy.
…lure

JNR-FFI uses dlopen with RTLD_NOW|RTLD_GLOBAL; ctypes defaults to RTLD_LAZY.
With RTLD_NOW all symbols must resolve immediately — if libmeos.dylib has
any unresolved symbol the load fails. Add tests for all four mode combinations
and print undefined symbols from libmeos.dylib to confirm the root cause.

Also revert forkCount=0 test (confirmed not the issue — same error in Maven JVM).
…ylib

The previous diagnostic step embedded Python code at column 0 inside a
YAML block scalar, terminating the block prematurely and causing a YAML
parse error that prevented all macOS CI steps from running.

Fix: emit the Python via printf so every line stays at the required
10-space YAML indent (the script content becomes arguments to printf,
not raw YAML content).

Root cause of the underlying UnsatisfiedLinkError: the JVM's hardened
runtime strips DYLD_LIBRARY_PATH, so when JNR-FFI's JFFI calls
dlopen(libmeos.dylib, RTLD_NOW|RTLD_GLOBAL), the transitive Homebrew
dependencies (libgeos, libproj, libgsl, libjson-c) installed under
$(brew --prefix)/lib cannot be resolved, and dlopen fails immediately.
Setting DYLD_LIBRARY_PATH=/usr/local/lib was correct for finding
libmeos.dylib itself, but did not help its deps after DYLD stripping.

Fix: add -DCMAKE_INSTALL_RPATH="$BREW_PREFIX/lib" to the cmake
configure step so the installed libmeos.dylib carries an embedded
LC_RPATH entry pointing at the Homebrew prefix. The dynamic linker
then resolves deps via RPATH even without DYLD_LIBRARY_PATH.

Also widen the DYLD_LIBRARY_PATH env export to include $BREW_PREFIX/lib
for processes (like python3) whose hardened-runtime entitlements do
allow DYLD vars.
…64 limit

On macOS ARM64 (Apple Silicon) with Java 21, JNR-FFI 2.2.17's ASM-based
proxy generator produces a <clinit>()V exceeding the JVM 64 KB method limit
for the 1683-method JMEOS MeosLibrary interface.  When that generation
fails the fallback createErrorProxy() also fails with MethodTooLargeException
(the error visible in CI logs).

With jnr.ffi.asm.enabled=false JNR-FFI falls back to reflection mode, which
builds the proxy via java.lang.reflect.InvocationHandler.  The resulting
<clinit> only stores Method references (~30 KB total) rather than JFFI
dispatch stubs, so it stays under the JVM limit.

Setting via MAVEN_OPTS (not surefire argLine) ensures the property reaches
the Maven JVM itself, where tests execute when forkCount=0.
… issue)

Windows: meos-windows-bootstrap now exposes SIZEOF_LONG_LONG in the
generated pg_config.h. On MSYS2/UCRT64 with GCC 16, sizeof(long)==4
so the SIZEOF_LONG==8 branch in pg_bitutils.h is not taken; the
fallback to SIZEOF_LONG_LONG==8 requires the macro to be defined.
ConfigurePgConfig.cmake already detects it (as SIZEOF_LONG_LONG_INT);
the fix aliases it and writes it to pg_config.h.in.

macOS: JNR-FFI cannot generate a proxy for the 1683-method JMEOS
functions interface on ARM64 Java 21 — the JDK proxy generator's
<clinit>()V exceeds the JVM 64 KB bytecode limit. libmeos.dylib itself
loads correctly (Python ctypes confirms all RTLD modes succeed). The
failure is in JNR-FFI's proxy generation and requires JMEOS to split
its functions interface. Mark macOS continue-on-error until then.
…h steps

Setting PATH from within the MSYS2 shell overwrites the Windows PATH
in GITHUB_ENV with a POSIX-style path, causing Maven to be unfindable
in subsequent PowerShell steps. Save only the DLL directory to a
separate MEOS_DLL_DIR variable (using cygpath -w for a native Windows
path) and prepend it in PowerShell explicitly.
JarLibraryLoader (JMEOS) in CI mode checks LD_LIBRARY_PATH (then
DYLD_LIBRARY_PATH) and passes the value to jnr.ffi.LibraryLoader.search();
PATH is never consulted.  On Windows neither env var was set, causing an
ExceptionInInitializerError before any test ran.

Fix: set LD_LIBRARY_PATH to the Windows-native meos-install/bin path so
JNR-FFI can locate libmeos.dll.  Also record UCRT64_BIN (the MSYS2 UCRT64
bin directory) and prepend it to PATH in the Unit tests PowerShell step so
Windows can resolve libmeos.dll's transitive runtime dependencies
(libgeos, libproj, libjson-c, libgsl).
Both Windows x86_64 and macOS ARM64 hit the same upstream JMEOS issue:
JNR-FFI cannot generate a JDK proxy for the 1683-method functions
interface (generated <clinit>()V exceeds the JVM 64 KB limit). Only
Linux x86_64 is unaffected. Update comments to reflect this.
Both JNR-FFI ASM mode and JDK reflection (java.lang.reflect.Proxy) mode
hit the JVM 64 KB <clinit>()V limit for the 1683-method JMEOS functions
interface. The jnr.ffi.asm.enabled=false + forkCount=0 approach was a
failed attempt. macOS and Windows remain non-blocking (continue-on-error)
until JMEOS splits its interface. Also correct the pom.xml jnr-ffi comment.
JMEOS 1.5 splits the monolithic MeosLibrary JNR-FFI interface (1486+14
methods) into four ≤ 400-method private sub-interfaces so each proxy
<clinit>()V stays well under the JVM 64 KB bytecode limit. This fixes
MethodTooLargeException on macOS (ARM64) and Windows (x86_64) — both
previously marked continue-on-error.

Changes:
- libs/JMEOS-1.5.jar: rebuilt from MobilityDB/JMEOS with interface split
  + 14 MEOS 1.3 additions (geo_from_text, tpoint_trajectory/2-arg,
  meos_initialize/0-arg, meos_set_spatial_ref_sys_csv, geom_to_geog,
  tspatial_to_stbox, eintersects_tgeo_geo, nad_tgeo_tgeo,
  edwithin_tgeo_tgeo, econtains_geo_tgeo, tdwithin_tgeo_tgeo,
  adisjoint_tgeo_tgeo, geom_contains, tgeo_at_geom) + macOS/Windows
  JarLibraryLoader support
- pom.xml: reference JMEOS-1.5.jar
- .github/workflows/maven.yml:
  * macOS/Windows: remove continue-on-error (now fixed)
  * all platforms: build libmeos from MobilityDB v1.3.0 source
  * Linux: switch from bundled .so extraction to source build
  * macOS: remove JMEOS-1.4 compatibility patches
  * Windows: update bootstrap-branch comment (v1.3.0 base, tzdata patch)
estebanzimanyi referenced this pull request in estebanzimanyi/MobilitySpark May 9, 2026
…rface

- Reflect new dependency chain: JMEOS MobilityDB#9 (JashanReel multi-module) →
  fix/multimodule-with-split-interface (split JNR-FFI + cleanup) → MobilitySpark #7
- Mark JMEOS #8 as recommended for closure (subsumed by MobilityDB#9, comment posted)
- Mark JMEOS MobilityDB#11 as superseded by the new multimodule integration branch
- Add integration branch table row (awaiting gh pr create on MobilityDB/JMEOS)
@estebanzimanyi
Copy link
Copy Markdown
Member Author

Status after attempting bundle integration into preview/100-percent

I tried adding this PR to the preview/100-percent bundle but the pom.xml conflict is a substantive author-decision, not a mechanical concat:

<<<<<<< HEAD (preview/100-percent — includes #10/#11/#12)
- JMEOS 1.4 (libs/JMEOS-1.4.jar; on Maven Central via local repo)
=======
- JMEOS 1.5 (libs/JMEOS-1.5.jar; interface-split, fixes JVM 64 KB
            MethodTooLargeException on macOS/Windows; not yet on Maven Central)
>>>>>>> pr/7

Two choices:

  1. Stay on JMEOS 1.4 — keep the current preview bundle composition; rebase this PR to use 1.4, the multi-platform CI changes ride independently.
  2. Move to JMEOS 1.5 — get the macOS/Windows JVM fix; the rest of the preview bundle would need rebasing onto 1.5 too (since feat(parity): cbuffer/npoint/pose/rgeo UDF surface — 92.5% → 99.6%, all six families #12's cbuffer/npoint/pose/rgeo work was built against 1.4's interface shape).

Either path is fine; both are author-only calls. Once you decide, the rebase is straightforward and this PR can join the next preview bump.

(.github/workflows/maven.yml conflict resolves naturally once the pom decision is made.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants