Skip to content

Implement Typed IR#504

Draft
Derppening wants to merge 51 commits into
hkust-taco:hkmc2from
Derppening:enhance/typed-ir
Draft

Implement Typed IR#504
Derppening wants to merge 51 commits into
hkust-taco:hkmc2from
Derppening:enhance/typed-ir

Conversation

@Derppening

Copy link
Copy Markdown
Contributor

No description provided.

@LPTK LPTK left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some possible next immediate steps:

  • Update printer to show the erased types at variable and member declaration/definition sites.
  • Update Lowering so it generates erased types from parameter type annotations, to be used to annotate the corresponding VarSymbol.
  • Make sure we are never overriding an existing erased type in a given symbol by using softAssert, as a sanity check.

A subtlety we should get right: the erasure of annotated class parameter types should successfully propagate to their defining fields. Param has a ``fldSym` which can be used for this.

Comment thread hkmc2/shared/src/main/scala/hkmc2/codegen/js/JSBuilder.scala Outdated
new Rewriter(instId).applyBlock(ogBody),
mkReturnCall(restFunSym, restFunArgs))
val refreshedFvSymbols = dtorBranchFnFvs(branchId._1).map(s => s -> new VarSymbol(Tree.Ident(s"fv_${s.nme}")))
val refreshedFvSymbols = dtorBranchFnFvs(branchId._1).map(s => s -> new VarSymbol(Tree.Ident(s"fv_${s.nme}"), erasedType = N))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems many cases like this one should carry over the previous erasedType somehow.

Comment thread hkmc2/shared/src/main/scala/hkmc2/codegen/Block.scala Outdated
@Derppening

Derppening commented Jun 1, 2026

Copy link
Copy Markdown
Contributor Author

The current task list (work items created by me, organized by AI):

  • Phase A — erasedType on Result (keystone infra) — ✅ committed

    • Enum redesign — PrimitiveType.Array + totalized sym; three-case ErasedType; ErasedType.sym; erasedType_!; Normalization → ObjectRef
    • Infra — Result extends HasErasedType; lazy val erasedType; abstract def on trait; Value's override val removed; this match over Tuple/Record/Instantiate/Value
    • Materialization sweep (Cat 1 + 2 + 3) — all applied, compiles clean, committed
    • Core sites (subTerm, ReflectionInstrumenter.assign, ValDefn.mk); join-point traps left N
  • Phase B — printer baseline (was C1) — captures post-A erasedType so later tightening shows as a diff

    • showErasedType toggle in Printer.scala (mirror showPurity, default OFF); render at variable + member declaration/definition sites
    • Commit one curated baseline test file (tuple/instantiate/lit/record/call/select/val-def, ≥1 case through Lifter) with post-A types as-is
  • Phase C — Annotation-driven erasedType + invariant guard (was Phase B; lands right after the B baseline)

    • Param-annotation → VarSymbol (Lowering erases param type annotation onto the VarSymbol)
    • Class-param erasure → defining field via Param.fldSym
    • softAssert no-clobber invariant (never override an existing symbol erasedType)
  • Phase D — WatBuilder consumes ErasedType (was C2; correctness harness)

    • Drive anyref cast targets from operand erasedType (additive + N-graceful); WASM goldens shift here
    • Optional explicit asserts when a known erasedType contradicts the required use-site type
  • Phase E — FuncRef + AnyFuncRef (was Phase D)

    • Add AnyFuncRef coarse constant, then FuncRef(params, result)
    • Fill Lambda → FuncRef
  • Phase F — refine residual inference (was Phase E)

    • Call return types (easy win: builtin-op result-type table, survey §6)
    • Select/DynSelect field/member types
    • rest params; Lifter capture symbols; function results
    • revisit resSym/l sites left N in Phase A

@LPTK

LPTK commented Jun 1, 2026

Copy link
Copy Markdown
Contributor
  • join-point traps left N

What does that mean?

  • C1 — showErasedType toggle in Printer.scala

This should be moved to phase B. In fact, it's the first thing yoiu should do, just so you can actually see what you're doing!

@Derppening

Copy link
Copy Markdown
Contributor Author
  • join-point traps left N

What does that mean?

erasedType = N, will be left for Phase D.

What does that mean?

  • C1 — showErasedType toggle in Printer.scala

This should be moved to phase B. In fact, it's the first thijng yoiu should do, just so you can actually see what you're doing!

Good point, I have updated to task list.

Comment thread hkmc2/shared/src/main/scala/hkmc2/codegen/Block.scala Outdated
Comment thread hkmc2/shared/src/main/scala/hkmc2/codegen/Block.scala
Comment thread hkmc2/shared/src/main/scala/hkmc2/semantics/Symbol.scala Outdated
Comment thread hkmc2/shared/src/test/mlscript/codegen/ErasedType.mls Outdated
Comment thread hkmc2/shared/src/main/scala/hkmc2/codegen/Lowering.scala Outdated
Comment thread hkmc2/shared/src/main/scala/hkmc2/semantics/Elaborator.scala
@Derppening

Copy link
Copy Markdown
Contributor Author

A summary of the current state of this branch (since I will work on NAC3 next week):

  • The basic system for typed IR is implemented
  • A number of locations either have their types inferred or propagated in the IR

Follow-ups:

  • Field selection on known qualifiers (e.g. let foo: Foo; foo.x)
  • Observation order-dependent inference - See ErasedType.mls:233

There is also a bigger issue that needs to be fixed: When narrowing the type of parameters/results in the WAT, Binaryen rejects the code with the error: invalid type: distinct rec groups would be identical after binary writing. This error does not manifest when using anyref because all unused types are erased by Binaryen during binary export. Claude states the following as the root cause of the issue:

Binaryen's isorecursive rec-group canonicalization producing two distinct rec groups that would serialize identically

This issue needs to be fixed separately, so I suspect by the time I pick this back up I will work on this issue first.

@LPTK

LPTK commented Jun 5, 2026

Copy link
Copy Markdown
Contributor
  • Field selection on known qualifiers (e.g. let foo: Foo; foo.x)

For now, all we need is for selections like foo.Foo#x to work. They already have the symbol, so this should be easy.

  • Observation order-dependent inference - See ErasedType.mls:233

IMO nontrivial inference (ie other than just propagating information inside out) isn't an objective of this PR. When the new frontend is in place, the resolution process will already do this sort of type inference for us, so there should be nothing to do at the IR level.

I don't really understand the binaryen problem. You need to investigate it more and really understand the cause. Claude hallucinates about these things half of the time (if not more).

Before moving on, can you please open separate micro-PRs to merge what can be merged separately (eg the NoSymbol object) so this PR doesn't get out of sync too badly?

@Derppening

Copy link
Copy Markdown
Contributor Author

Before moving on, can you please open separate micro-PRs to merge what can be merged separately (eg the NoSymbol object) so this PR doesn't get out of sync too badly?

Done, opened as #516. There doesn't seem to be other things that can be merged separately for now.

@CAG2Mark

CAG2Mark commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Could you pls notify me when this is ready? I feel like we should try to make the lifter type-preserving. I can do it in a separate PR.

BTW, tail recursion optimization for mutually recursive functions will break the typedness of the IR. When we have mutually recursive tailrec functions, we merge them into one giant function with a switch whose parameter list is a union of the other parameter lists, rather than a concatenation of them, so each parameter could have a different type depending on what function we are entering.

@Derppening

Derppening commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

I don't really understand the binaryen problem. You need to investigate it more and really understand the cause. Claude hallucinates about these things half of the time (if not more).

After some really deep sessions and several MREs, here is the root cause of the issue:

As expected, Binaryen's parseText API doesn't enable any features while parsing the WAT. As such, when Binaryen encounters any user-defined composite type, it gets erased into its abstract top type (e.g. any, func, eq etc., not preserving nullability) - And if this erasure process causes the erased type to match a non-eraseable type, it collides.

Example:

        (type $Object (sub (struct)))
        (type $O (sub $Object (struct (field $i i32))))
        (type $O_ctor (func (result (ref $O))))
        (type $entry (func (result (ref null any))))

$O_ctor and $entry will collide, because $O_ctor contains (ref $Foo), which is erased into (ref null any). This process causes the signature of $O_ctor and $entry to match, thus emitting the distinct rec groups would be identical after binary writing error.

One solution is to use rec groups to group type definitions together (e.g. based on the owning class), so that they no longer match:

        (type $Object (sub (struct)))
        (rec
            (type $O (sub $Object (struct (field $i i32))))
            (type $O_ctor (func (result (ref $O)))))
        (type $entry (func (result (ref null any))))  ;; (rec (type $entry ...))

However, this would introduce a more subtle issue - If two classes have exactly the same type on everything, except one field, method parameter, or result type just so happens to be uninferrable in the IR and is coerced down into anyref, then this would still rear its ugly head.

I can only think of two ways moving forward.

  1. I create a pull request in Binaryen and Binaryen.js to add an additional parameter to parseText that indicates the features that should be enabled when parsing the module. This should be simpler, but the timeline is relatively unclear (need to wait for maintainers to merge the changes on both sides).
  2. Implement a Wasm binary backend, thus bypassing the whole Binaryen charades. This is much more effort, but should result in a better foundation to work on.

Let me know what you think about this.

Edit: The MREs can be found here

@LPTK

LPTK commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Thanks for looking into this!

  1. I create a pull request in Binaryen and Binaryen.js to add an additional parameter to parseText that indicates the features that should be enabled when parsing the module. This should be simpler, but the timeline is relatively unclear (need to wait for maintainers to merge the changes on both sides).

Yes, can we just fork Binaryen, fix this, and open a PR? In the meantime, we just use our own fork. Can be done either with a Git submodule or by publishing our own npm package for the fork (or some other solution, whichever you prefer).

@Derppening

Derppening commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for looking into this!

  1. I create a pull request in Binaryen and Binaryen.js to add an additional parameter to parseText that indicates the features that should be enabled when parsing the module. This should be simpler, but the timeline is relatively unclear (need to wait for maintainers to merge the changes on both sides).

Yes, can we just fork Binaryen, fix this, and open a PR? In the meantime, we just use our own fork. Can be done either with a Git submodule or by publishing our own npm package for the fork (or some other solution, whichever you prefer).

Done in #536, although not as a submodule - I instead forked the Binaryen repo, patched it to add parseTextWithFeatures, and pointed packages.json to my fork of Binaryen instead.

@LPTK

LPTK commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Please fix the conflicts of this branch, and let's try to merge it soon, isntead of letting it diverge again over weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants