Skip to content

Add mesh primitive with PIMPL cache via gen-ctl-io lifecycle hooks#91

Merged
stevengj merged 6 commits into
NanoComp:masterfrom
Luochenghuang:lifecycle-hooks-upstream
Jun 5, 2026
Merged

Add mesh primitive with PIMPL cache via gen-ctl-io lifecycle hooks#91
stevengj merged 6 commits into
NanoComp:masterfrom
Luochenghuang:lifecycle-hooks-upstream

Conversation

@Luochenghuang
Copy link
Copy Markdown
Contributor

@Luochenghuang Luochenghuang commented Jun 5, 2026

Closes #86, closes #82, closes #88 and removes the the original mesh PR's hand-edited generated files introduced.

  1. PIMPL refactor for the mesh struct, expressed via a small gen-ctl-io DSL extension. define-class now accepts optional (after-copy <fn>) and (after-destroy <fn>) clauses that emit hook calls at the tail of the generated copy / destroy bodies. The public mesh struct in ctlgeom-types.h now contains only Scheme-visible fields plus void *internal; the BVH and other C-only state move into a private mesh_internal inlined at the top of geom.c (no companion header and keeps cp-builds self-contained). mesh_copy eagerly rebuilds the cache via mesh_after_copy; mesh_destroy frees it via mesh_after_destroy. ctlgeom-types.h and geom-ctl-io.c are pure gen-ctl-io output again. Closes compilation failre with new mesh feature #86.

  2. Eliminate slist_unique VLA in intersect_line_with_prism. Existing dead-store bug: the dedup loop wrote into a stack VLA and then did slist = slist_unique;, a no-op since slist is pass-by-value. Symptom: a thin spurious "inside" sliver and a dropped real crossing on edge-grazing rays. OMP=1 missed it (deterministic rand sequence); OMP=8 reshuffles libc rand and trips it routinely. Fix: dedup in place. Closes intersect_line_with_prism: slist dedup never writes back #82.

  3. Stop tracking utils/geom-ctl-io.c and utils/ctlgeom-types.h. Both are pure gen-ctl-io output and have only been in git since PR Mesh geometry import #74. With the DSL above there is no hand-edit content to preserve. Move to nodist_libctlgeom_la_SOURCES and EXTRA_DIST so make dist still bundles them into release tarballs. Restores the pre-Mesh geometry import #74 contract: WITHOUT_GUILE supported via tarball, not git clone.

@Luochenghuang Luochenghuang mentioned this pull request Jun 5, 2026
Comment thread utils/Makefile.am Outdated
Comment thread utils/Makefile.am Outdated
Comment thread base/class.scm Outdated
@Luochenghuang Luochenghuang force-pushed the lifecycle-hooks-upstream branch from b6332dc to 5288b47 Compare June 5, 2026 19:30
Comment thread utils/geom.c
@Luochenghuang Luochenghuang force-pushed the lifecycle-hooks-upstream branch from dd870d1 to 46ba2cf Compare June 5, 2026 19:38
…-driven public struct

PR NanoComp#74 (mesh geometry import) hand-extended utils/ctlgeom-types.h with C-only internal fields (mesh_bvh_node, face_normals, face_areas, bvh, bvh_face_ids, is_closed, centroid, lengthscale, num_bvh_nodes, and a flat int* face_indices) because none of those had natural Scheme representations in geom.scm. That worked when Guile wasn't detected at configure time, but any from-git build with Guile installed (including the new GitHub Actions CI) regenerated ctlgeom-types.h from geom.scm, clobbered the hand-extensions, and failed to compile with 'unknown type name mesh_bvh_node'.

This commit applies the PIMPL (pointer-to-implementation) idiom to clean up that situation properly. The public mesh struct in ctlgeom-types.h now contains only Scheme-visible fields — vertices, face_indices (as a vector3_list, each vector3 packing one triangle's three int indices), and a void* internal pointer. All C-only state moves into mesh_internal in a new private header utils/mesh-internal.h, accessed via a static-inline mesh_priv(m) cast helper. ctlgeom-types.h goes back to being a regular auto-generated artifact derived from geom.scm; no Makefile.am workaround needed.

Lifecycle:
  - make_mesh / make_mesh_with_center: populate public fields, then call init_mesh which calloc()s mesh_internal, unpacks face_indices into a flat int[], computes face normals/areas, checks watertight closure, builds the BVH, and stores the pointer in m->internal.
  - reinit_mesh: fast-path returns when m->internal != NULL; otherwise calls init_mesh. Same single-threaded-init contract as before.
  - mesh_destroy (auto-generated): frees vertices.items + face_indices.items, then calls mesh_internal_free (defined in geom.c) to free the cache.
  - mesh_copy (auto-generated): deep-copies the public fields, sets internal=NULL, then calls mesh_init_internal (defined in geom.c) to eagerly rebuild the BVH on the copy. Eliminates the previous shallow-pointer-share footgun where two mesh copies pointed at the same BVH allocation.

geom.c gets ~105 mechanical m->X → mesh_priv(m)->X rewrites for the 10 internal fields (the cast inlines to nothing at runtime). test-mesh.c gets the same rewrite for 6 sites where tests poke directly at m->is_closed.

Both auto-generated files (ctlgeom-types.h and geom-ctl-io.c) now match what gen-ctl-io would emit from the simplified geom.scm, with one hand-tweak in geom-ctl-io.c: mesh_destroy and mesh_copy call into the externally-defined mesh_internal_free / mesh_init_internal so they can manage the opaque cache. This is the minimum manual surface required for any PIMPL design.

Verified: configure + make + make check all clean. test-prism passes (0/10000 point failures, 0/1000 segment failures, 0/65 prism helper failures). test-mesh passes 0 failures across all 19 cases including test_copy_destroy (which exercises mesh_copy + point queries on the copy + destroy of both original and copy).
The duplicate-removal step previously wrote deduped values into a stack VLA slist_unique[num_vertices+2] and then did 'slist = slist_unique;' before returning the new count. That assignment is a dead store: slist is a pass-by-value pointer parameter, so reassigning it only rebinds the local variable. The caller's buffer is never touched and slist_unique is destroyed when the function returns, so the caller reads the first num_unique_elements entries of the sorted-but-undeduped buffer instead of the actual deduped values.

In intersect_line_segment_with_prism this corrupts the inside/outside parity walk: a near-duplicate that the dedup loop intended to drop stays in the buffer (producing a phantom thin 'inside' sliver of width ~1e-3*|s|), and a real far-end crossing is silently dropped because the caller stops at the smaller returned count. The bug triggers when two s-values fall within 1e-3*|s| of each other — edge-grazing rays, rays through a shared vertex, near-tangent rays. test-prism's random-ray distribution didn't statistically hit the window often enough at OMP=1 to flag it, but parallel runs (e.g. OMP=8) shuffle the libc rand() sequence and hit the window often enough that the strict 1e-6 segment-length check fails.

The fix eliminates the VLA entirely. The dedup is done in place with a write head 'iv' and read head 'nv': because writes go to slist[iv++] with iv <= nv, position nv-1 is either untouched or was previously written with its own value as a no-op, so reading slist[nv-1] still yields the original sorted value and the comparison semantics are bit-identical to the original intent. Net effect: removes the dead store, removes a potentially huge stack allocation, and the dedup now actually applies to the caller's buffer.

Verified with --enable-openmp: test-prism passes at both OMP_NUM_THREADS=1 (0/10000 points failed, 0/1000 segments failed, 0/65 prism helper failures) and OMP_NUM_THREADS=8 across multiple trials. test-mesh remains clean at OMP=8 as well.
The previous PIMPL commit placed the private mesh_bvh_node / mesh_internal definitions and the mesh_priv() accessor in a separate header utils/mesh-internal.h, included by geom.c. That breaks MPB, meep, and the libctl examples/ directory itself: all of those build by 'cp -f utils/geom.c their_tree/geom.c' and compile the copy in their own build context, without the companion header.

Until those downstreams are taught to link against libctlgeom directly (instead of copying geom.c), geom.c must remain self-contained. This commit inlines the private definitions directly at the top of geom.c (right after the existing includes) and removes the #include 'mesh-internal.h' line. The mesh-internal.h file is deleted.

To preserve test-mesh's ability to verify the closure check without exposing mesh_priv outside geom.c, the is_closed field is promoted to the public mesh struct via geom.scm. It is conceptually a derived public property anyway (users may want to query 'did init_mesh detect this mesh as watertight?'), and the auto-generated mesh_copy / mesh_equal handle it as a plain boolean value with no special code. ctlgeom-types.h and geom-ctl-io.c are updated to match. test-mesh.c drops the #include 'mesh-internal.h' line and reverts mesh_priv(m)->is_closed back to m->is_closed.

Build clean. test-prism and test-mesh both pass at OMP_NUM_THREADS=1 and 8: test-mesh 0 failures (including 0 mismatches in test_cube_vs_block and test_cube_segments_vs_block); test-prism 0/10000 point failures, 0/1000 segment failures, 0/65 prism helper failures.
Extend define-class so a class can declare two optional class-level
hooks, each naming a C function that gen-ctl-io invokes from the
generated copy/destroy bodies:

    (define-class mesh geometric-object
      (define-property vertices ...)
      ...
      (define-property internal '() 'SCM)
      (after-copy    mesh_after_copy)
      (after-destroy mesh_after_destroy))

The hooks have signature void fn(<type> *o). after-copy is appended
to the end of <type>_copy after the standard field copies, so it sees
a fully-populated destination object. after-destroy is appended to
the end of <type>_destroy and is called by value-then-address
(fn(&o);), so it can still read opaque SCM/void* fields that
gen-ctl-io's destroy is a no-op for (e.g. an opaque cache pointer).

This is the upstream fix for the problem refactor-mesh-pimpl worked
around by committing a hand-edited geom-ctl-io.c with extern decls
and call sites manually spliced into mesh_copy / mesh_destroy. With
this DSL, the lifecycle contract lives in geom.scm and the generated
file stays regenerable on every build.

Changes:
- base/class.scm: make-class takes properties as a list arg and
  carries an opt alist; new accessors class-after-copy /
  class-after-destroy. define-class macro recognises trailing
  (after-copy <fn>) / (after-destroy <fn>) forms in the body and
  threads them through to make-class. Existing specs unaffected.
- utils/ctl-io.scm: class-copy-function / class-destroy-function emit
  the hook calls at the body tail. output-class-copy-functions-header
  / output-class-destruction-functions-header emit corresponding
  extern void <fn>(<type> *); prototypes.
- utils/lifecycle-hooks-spec.scm + utils/test-lifecycle-hooks.sh:
  fixture spec exercising both hooks, plus a shell test that runs
  gen-ctl-io against it and greps for the expected externs and call
  sites. Wired into TESTS under WITH_GUILE in utils/Makefile.am.

Verified:
- WITHOUT_GUILE: make + make check clean (test-prism, test-mesh).
- Backwards compat: regenerating utils/geom.scm produces zero hook
  artifacts (no class declares them yet).
- Test passes: gen-ctl-io emits the expected externs and calls.
- WITH_GUILE full build blocked by an unrelated pre-existing master
  bug (geom.c references mesh_bvh_node / m->bvh / m->centroid that
  master never declared); standalone gen-ctl-io invocation against
  the fixture spec works correctly.
Replace the hand-edited geom-ctl-io.c workaround on the mesh class with
the (after-copy ...) / (after-destroy ...) DSL keywords added in the
previous commit. The lifecycle contract now lives in geom.scm; the
generated file is regenerable on every build.

geom.scm: declare the mesh class's lifecycle hooks alongside the
internal property.

geom.c: add two adapter functions (mesh_after_copy, mesh_after_destroy)
that gen-ctl-io invokes from the auto-generated copy / destroy.
mesh_after_copy nulls the shallow-copied internal pointer and calls
mesh_init_internal to build a fresh BVH for the copy. mesh_after_destroy
calls mesh_internal_free on the opaque pointer. The original
mesh_init_internal / mesh_internal_free are now static since they are
only called from within geom.c.

Makefile.am: undo the workaround. geom-ctl-io.c goes back to
BUILT_SOURCES with its original sed-from-ctl-io.c regen rule (under
WITH_GUILE), back into clean-local, and out of EXTRA_DIST. The long
"hand-maintained" comment block is deleted. The committed
geom-ctl-io.c stays committed (mirrors the existing ctlgeom-types.h
arrangement) so WITHOUT_GUILE git-clone builds can compile; WITH_GUILE
builds regenerate over it and the content matches.

geom-ctl-io.c (regenerated, committed): mesh_copy now does the
standard shallow copy of internal followed by mesh_after_copy(o);
mesh_destroy does the standard frees followed by mesh_after_destroy(&o).
Hand-edited extern declarations and inline init/free calls removed.

ctlgeom-types.h (regenerated, committed): two extra extern void
mesh_after_*(mesh *); declarations.

Verified:
- WITH_GUILE: make + make check clean. test-prism, test-mesh,
  test-lifecycle-hooks.sh all PASS.
- The regenerated geom-ctl-io.c differs from the committed copy only
  in whitespace (no indent(1) on this host); the semantic diff is
  exactly the expected hook-call substitution. Pre-existing concern,
  not introduced here.
Both files are mechanical output of gen-ctl-io / sed from geom.scm.
PR NanoComp#74 began tracking them so WITHOUT_GUILE git-clone builds could
compile, but that "courtesy" introduced the schema-vs-committed drift
class of bugs (the same one 50e5498 had to fix and that motivated the
refactor-mesh-pimpl workaround). With the (after-copy ...) /
(after-destroy ...) DSL absorbing the hand-edits, the committed copies
match the regenerated ones by construction — yet keeping them in git
still requires every geom.scm edit to come with a manually regenerated
commit, which is brittle and easy to forget.

Restore the pre-NanoComp#74 contract: WITHOUT_GUILE is supported via tarball
install only. Building WITHOUT_GUILE from a git clone now fails fast
(no rule to make geom-ctl-io.c) instead of silently using a stale
checked-in copy.

Changes:
- git rm utils/geom-ctl-io.c utils/ctlgeom-types.h
- utils/Makefile.am: move both files from libctlgeom_la_SOURCES to
  nodist_libctlgeom_la_SOURCES (they're built, not distributed-as-
  source). Add them to EXTRA_DIST so `make dist` bundles the
  regenerated copies into the tarball, restoring the tarball-install
  WITHOUT_GUILE path. ctlgeom-types.h was already nodist_include_
  HEADERS; that stays.

Verified:
- WITH_GUILE from clean git clone: make + make check clean. 3/3
  PASS (test-prism, test-mesh, test-lifecycle-hooks.sh).
- `make dist` on a Guile machine: tarball contains both regenerated
  files at libctl-4.6.0/utils/.
- Tarball → extract → configure --without-guile → make + make
  check: 2/2 PASS (test-prism, test-mesh; lifecycle hooks skipped
  per existing WITH_GUILE gate).
- WITHOUT_GUILE from git clone: fails as expected with "No rule to
  make target 'geom-ctl-io.c'".

Net effect: zero hand-maintained generated files. Schema is the
single source of truth. CI (which has Guile) is unaffected.
@Luochenghuang Luochenghuang force-pushed the lifecycle-hooks-upstream branch from 46ba2cf to d4bcf61 Compare June 5, 2026 19:41
@stevengj stevengj merged commit 118b15c into NanoComp:master Jun 5, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

compilation failre with new mesh feature intersect_line_with_prism: slist dedup never writes back

2 participants