Skip to content

Add Mixtral architecture adapter tests#1329

Merged
jlarson4 merged 2 commits into
TransformerLensOrg:devfrom
RecreationalMath:mixtral-adapter-test
May 26, 2026
Merged

Add Mixtral architecture adapter tests#1329
jlarson4 merged 2 commits into
TransformerLensOrg:devfrom
RecreationalMath:mixtral-adapter-test

Conversation

@RecreationalMath
Copy link
Copy Markdown
Contributor

Description

Adds a unit test suite for MixtralArchitectureAdapter under tests/unit/model_bridge/supported_architectures/, following the existing adapter-test pattern (modelled on the qwen3_moe and qwen2 suites). It needs no model downloads or real checkpoints, it uses tiny programmatic TransformerBridgeConfig objects, plus small synthetic tensors and a fake attention module for the behavioural tests, so it runs on CPU in seconds.

The suite (49 tests) covers:

  • Adapter config defaults (RMSNorm, rotary, gated MoE MLP, final_rms=False).
  • Weight conversions: QKVO weights plus Q/K/V biases, with GQA-aware head counts and the no-n_key_value_heads fallback.
  • Numerical round-trips: the rearrange conversions are actually run on synthetic HF-shaped weight and bias tensors, asserting the split-head output shapes and lossless reversion.
  • Component-mapping structure, bridge types, and HF module paths, including the block_sparse_moe MoE bridge with its gate router and the absence of Q/K-norm.
  • Factory registration and dispatch via select_architecture_adapter.
  • GQA forward hook shapes: a fake attention module wired into the bridge confirms Q surfaces n_heads while K/V surface n_key_value_heads.
  • setup_component_testing rotary-embedding wiring, eager-attention forcing, and robustness on a minimal HF model.
  • Architecture guards against drift.

Contributes to #1302 (Mixtral checkbox).

Type of change

  • New feature (non-breaking change which adds functionality)

Checklist:

  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have not rewritten tests relating to key interfaces which would affect backward compatibility

Download-free unit suite for MixtralArchitectureAdapter (49 tests, no
real checkpoints), following the existing adapter-test pattern. Covers config
defaults, QKVO and Q/K/V bias conversions with GQA head counts, component
mapping (including the MoE bridge), factory dispatch, numerical conversion
round-trips, GQA forward hook shapes, setup_component_testing, and
architecture guards.
@jlarson4
Copy link
Copy Markdown
Collaborator

Excellent work on this @RecreationalMath! Merging

@jlarson4 jlarson4 merged commit 344f344 into TransformerLensOrg:dev May 26, 2026
24 checks passed
sunny1401 pushed a commit to sunny1401/TransformerLens that referenced this pull request May 26, 2026
* Add Mixtral architecture adapter tests

Download-free unit suite for MixtralArchitectureAdapter (49 tests, no
real checkpoints), following the existing adapter-test pattern. Covers config
defaults, QKVO and Q/K/V bias conversions with GQA head counts, component
mapping (including the MoE bridge), factory dispatch, numerical conversion
round-trips, GQA forward hook shapes, setup_component_testing, and
architecture guards.

* Minor docstring change
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants