Commit c04420f
fix: correct residual RMS norm for Qwen3.5 — fixes garbage output
The root cause of garbage output (on both CPU and Metal) was incorrect
handling of residual RMS norm weights. Qwen3.5 uses (1 + weight) * rms_norm(x)
for all norm layers, meaning stored weights are deltas from 0 that need
+1.0 added at load time.
Two bugs:
1. load_rms_norm_weight had an auto-detection heuristic (threshold 0.5)
that incorrectly skipped the +1.0 for weights that had drifted above
0.5 during training (e.g., linear_attn.norm, later layer input norms).
Removed the heuristic — always add 1.0 when residual_rms_norm is set.
2. RmsNormGated in linear_attention.rs loaded its weight directly without
adding +1.0, unlike all other norms that go through load_rms_norm_weight.
Also reverted unnecessary .contiguous() calls added during debugging.
Verified against HuggingFace reference implementation (transformers 5.3.0)
which confirms: forward = output * (1.0 + self.weight.float()).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent cc511ce commit c04420f
2 files changed
Lines changed: 6 additions & 16 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
159 | 159 | | |
160 | 160 | | |
161 | 161 | | |
162 | | - | |
163 | | - | |
164 | | - | |
165 | | - | |
166 | | - | |
167 | | - | |
168 | | - | |
169 | | - | |
170 | | - | |
171 | | - | |
172 | | - | |
173 | | - | |
174 | | - | |
175 | | - | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
176 | 165 | | |
177 | 166 | | |
178 | 167 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
29 | | - | |
30 | | - | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
31 | 32 | | |
32 | 33 | | |
33 | 34 | | |
| |||
0 commit comments