Skip to content

perf(rv64): split base_alu into add_sub and xor_or_and chips#2883

Open
GunaDD wants to merge 2 commits into
develop-v2.1.0-rv64from
perf/split-base-alu-u16
Open

perf(rv64): split base_alu into add_sub and xor_or_and chips#2883
GunaDD wants to merge 2 commits into
develop-v2.1.0-rv64from
perf/split-base-alu-u16

Conversation

@GunaDD

@GunaDD GunaDD commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Re-do of PR #2777 (base_alu part only), now on top of the u16 memory-bus limbs change. Summary of the changes:

  • Split base_alu chip into add_sub and xor_or_and chops.
  • New xor_or_and chip is the old base_alu minus ADD/SUB.
  • New add_sub chip handles the add and sub opcodes and store 2 bytes per field element in its column.
  • This allows us to remove the interactions needed to range check that each individual field elements is bytes that was present in the previous base_alu chip.
  • Core width of the add_sub chip drops from 29 columns (3×8 limbs + 5 flags) compared to the 14 (3×4 cells + 2 flags) of the base_alu chip.
  • add_sub/tests.rs was rewritten for the new layout: pranks expressed as 4 u16 cells, the out-of-range negative tests rebuilt around the 2^16 boundary to overflow a 16-bit cell), the immediate-limb-shuffle negative test removed because the multi-limb immediate decomposition it attacked no longer exists, and a new added to verify the memory-busbinding that justifies skipping range checks on b/c. Sanity tests convert byte vectors through rv64_bytes_to_u16_block.

Closes INT-8102

Re-do of PR #2777 (base_alu part only) on top of the u16 memory-bus limbs
change. The 64-bit BaseAlu chip is split into:

- add_sub: ADD/SUB with carry constraints, send_xor(a,a,0) result range
  checks, and the paired send_range(b,c) read-byte bounds required now
  that the memory bus only checks packed u16 values
- xor_or_and: XOR/OR/AND via the bitwise lookup, which already bounds the
  read bytes

The BaseAlu core (cols/AIR/executor/filler) is kept since the bigint
INT256 extension still uses it; its rv64-specific execution/cuda/tests
move to the new chips. base_alu_w is left untouched for now.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown
group app.proof_time_ms app.cycles leaf.proof_time_ms
fibonacci 1,609 4,000,051 527
keccak 16,550 14,365,133 3,009
sha2_bench 8,896 11,167,961 1,148
regex 1,603 4,090,656 428
ecrecover 476 112,210 282
pairing 620 592,827 297
kitchen_sink 4,015 1,979,971 867

Note: cells_used metrics omitted because CUDA tracegen does not expose unpadded trace heights.

Commit: 068dbdc

Benchmark Workflow

@github-actions

Copy link
Copy Markdown

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant