Skip to content

Automatically apply tail call optimizations#14933

Merged
JaroslavTulach merged 13 commits intodevelopfrom
wip/jtulach/TailCall10956
Apr 21, 2026
Merged

Automatically apply tail call optimizations#14933
JaroslavTulach merged 13 commits intodevelopfrom
wip/jtulach/TailCall10956

Conversation

@JaroslavTulach
Copy link
Copy Markdown
Member

@JaroslavTulach JaroslavTulach commented Apr 1, 2026

Pull Request Description

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • All code follows the
    Scala,
    Java,
  • Unit tests have been written where possible.
  • engine benchmarks are OK
  • standard library benchmarks are OK

@JaroslavTulach JaroslavTulach self-assigned this Apr 1, 2026
@JaroslavTulach JaroslavTulach marked this pull request as draft April 1, 2026 17:22
return switch (getTailStatus()) {
case TAIL_DIRECT -> directCall.executeCall(frame, function, callerInfo, state, arguments);
case TAIL_LOOP -> throw new TailCallException(function, callerInfo, arguments);
case TAIL_DIRECT, // -> directCall.executeCall(frame, function, callerInfo, state, arguments);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member Author

@JaroslavTulach JaroslavTulach Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's start from basics again:

sbt:runtime-benchmarks> benchOnly CallableBenchmarks
Benchmark                                    Mode  Cnt    Score    Error  Units
benchSumTCOfromCall                          avgt    5  749,231 ± 71,350  ms/op
benchSumTCOmethodCall                        avgt    5  410,419 ± 22,373  ms/op
benchSumTCOmethodCallWithDefaultedArguments  avgt    5  395,680 ± 20,531  ms/op
benchSumTCOmethodCallWithNamedArguments      avgt    5  400,371 ± 19,671  ms/op

is the basic result of TCO related benchmark from this branch. It is indeed slow the develop branch runs at:

Benchmark                                                       Mode  Cnt   Score   Error  Units
benchSumTCOfromCall                          avgt    5  75,386 ± 1,417  ms/op
benchSumTCOmethodCall                        avgt    5  75,965 ± 3,086  ms/op
benchSumTCOmethodCallWithDefaultedArguments  avgt    5  85,326 ± 6,681  ms/op
benchSumTCOmethodCallWithNamedArguments      avgt    5  75,170 ± 2,196  ms/op

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • with 0d6509c I have following results locally
Benchmark                                                Score    Error  Units
benchSumTCOfromCall                                      71,632 ±  7,940  ms/op
benchSumTCOfromCallWithTailCall                          74,268 ± 10,342  ms/op
benchSumTCOmethodCall                                    77,010 ± 15,595  ms/op
benchSumTCOmethodCallWithDefaultedArguments              72,061 ±  3,377  ms/op
benchSumTCOmethodCallWithDefaultedArgumentsWithTailCall  68,628 ±  0,750  ms/op
benchSumTCOmethodCallWithNamedArguments                  69,425 ±  2,434  ms/op
benchSumTCOmethodCallWithNamedArgumentsWithTailCall      69,752 ±  0,438  ms/op
benchSumTCOmethodCallWithTailCall                        73,485 ±  4,974  ms/op

@JaroslavTulach JaroslavTulach added the CI: Clean build required CI runners will be cleaned before and after this PR is built. label Apr 13, 2026
@JaroslavTulach JaroslavTulach changed the title Always use TailCallException when possible Automatically detect functions suitable for tail call optimizations Apr 14, 2026
@JaroslavTulach JaroslavTulach marked this pull request as ready for review April 14, 2026 08:17
@JaroslavTulach JaroslavTulach linked an issue Apr 14, 2026 that may be closed by this pull request
deep_error n:Integer = if n <= 0 then Error.throw "Here is an error" else
deep_error n-1
x = deep_error n-1
x+1
Copy link
Copy Markdown
Member Author

@JaroslavTulach JaroslavTulach Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The need to capture x return value and return some other value is a demonstration of the success of automatic tail call optimization! Without these additional operations the deep_xyz call gets optimized and the stack is one and not 21:

Prior to #14480 the depth of Error.throw was ignored altogether
Expecting depth of 21, but was 1

If x is an error, then the execution will return the error as x+1 will still yield an error. In case of deep_throw the panic is propagated and x+1 code is skipped.

@JaroslavTulach JaroslavTulach changed the title Automatically detect functions suitable for tail call optimizations Automatically apply tail call optimizations Apr 20, 2026
@JaroslavTulach
Copy link
Copy Markdown
Member Author

  • Engine benchmarks are perfectly fine with this PR!
  • the same code with or without @Tail_Call
  • runs at the same speed in all TCO benchmarks!
TCO

@JaroslavTulach
Copy link
Copy Markdown
Member Author

JaroslavTulach commented Apr 21, 2026

Some standard library benchmarks may get improved. For example Cross_Tab seems to indicate some improvements:

Cross_Tab

Maybe Table_Aggregate:

Table_Aggregate

but clearly there are no regressions - which was the biggest reason why @Tail_Call had to be introduced. E.g. all's working well.

@JaroslavTulach JaroslavTulach merged commit d5803aa into develop Apr 21, 2026
82 of 84 checks passed
@JaroslavTulach JaroslavTulach deleted the wip/jtulach/TailCall10956 branch April 21, 2026 11:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

-compiler CI: Clean build required CI runners will be cleaned before and after this PR is built.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Automatically detect suitable @Tail_Call optimization

2 participants