FIX IA3 _ConvNd.unmerge bias uses torch.mul instead of torch.div#3149
FIX IA3 _ConvNd.unmerge bias uses torch.mul instead of torch.div#3149Chessing234 wants to merge 2 commits into
Conversation
\`_ConvNd.unmerge\` in \`src/peft/tuners/ia3/layer.py\` is supposed to
undo the corresponding \`_ConvNd.merge\` by inverting the (IA)^3
scaling applied to the base layer's weight **and** bias. The weight
path correctly divides by the IA3 scaling (line 278):
base_layer.weight.data = torch.div(
base_layer.weight.data, ia3_scaling + 1e-8
).to(orig_dtype)
but the bias path right below it accidentally uses \`torch.mul\`
again, the same op \`merge\` uses:
# merge (line 256)
base_layer.bias.data = torch.mul(base_layer.bias.data, scaling.data).to(orig_dtype)
...
# unmerge (line 283) <-- BUG: should be torch.div
base_layer.bias.data = torch.mul(base_layer.bias.data, scaling.data).to(orig_dtype)
So a merge followed by an unmerge leaves the bias scaled by
\`ia3_l ** 2\` instead of being restored to its original value. For
any non-feedforward IA3 Conv layer (Conv2d / Conv3d) whose base
layer has \`bias is not None\`, \`merge() -> unmerge()\` permanently
corrupts the bias — and subsequent \`disable_adapters\`-triggered
unmerges compound the error.
The sibling \`Linear.unmerge\` in the same file (line 155) already
uses the correct pattern:
base_layer.bias.data = torch.div(
base_layer.bias.data, scaling.data + 1e-8
).to(orig_dtype)
Mirror that pattern in \`_ConvNd.unmerge\`: divide by
\`scaling.data + 1e-8\` so the bias unmerge is the inverse of the
bias merge, and the \`+ 1e-8\` tolerance guards against division by
any IA3 scaling entry that happens to be zero.
There was a problem hiding this comment.
Hey, nice find.
Can you also reduce the tolerance in test_custom_models.py for test_merge? (to verify that the fix works)
--- a/tests/testing_common.py
+++ b/tests/testing_common.py
@@ -608,7 +608,7 @@ class PeftCommonTester:
atol, rtol = 1e-3, 1e-3
if (config.peft_type in {"IA3", "LORA", "OFT"}) and (model_id in conv_ids):
# for some reason, the Conv introduces a larger error
- atol, rtol = 0.3, 0.01
+ atol, rtol = 1e-2, 0.01
if (config.peft_type == "OFT") and not is_decoder:
atol, rtol = 0.3, 0.01
if model_id == "trl-internal-testing/tiny-Llama4ForCausalLM":It would also be nice if you'd found out why test_safe_merge doesn't detect this issue.
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
|
gentle ping @Chessing234 - do you still plan on working on this? |
Drop atol from 0.3 to 1e-2 (rtol unchanged at 0.01) on the IA3/LORA/OFT + Conv branch in _test_merge_layers. The previous 0.3 tolerance was loose enough to hide the IA3 _ConvNd.unmerge bias bug this PR fixes; with the fix, all 126 Conv2d/Conv3d merge cases pass under the tighter bound, and reverting the fix makes the affected IA3-Conv2d cases fail.
|
Hey @githubnemo, sorry for the silence — yes, still on it. Just pushed the tolerance change you suggested ( On Happy to also add an explicit merge/unmerge round-trip assertion to |
Bug
_ConvNd.unmergeinsrc/peft/tuners/ia3/layer.pydoes not undo the bias scaling that_ConvNd.mergeapplied. After amerge() -> unmerge()round-trip on an IA3 non-feedforward Conv2d/Conv3d layer whose base layer hasbias is not None, the base layer's bias ends up multiplied byia3_l ** 2instead of being restored to its pre-merge value, and each subsequentdisable_adapters-triggered unmerge compounds the error.Root cause
In
_ConvNd.merge(line 254–256) the bias is scaled withtorch.mul:and in
_ConvNd.unmerge(line 280–283) the bias is "un-scaled" by also usingtorch.mul— exactly the same op:So
bias_merged = bias * ia3_land thenbias_unmerged = bias_merged * ia3_l = bias * ia3_l ** 2, rather thanbias.The corresponding weight path in the same function (line 278) uses
torch.divcorrectly:and the sibling
Linear.unmergebias path at line 152–155 also does it correctly:so the intended shape of the fix is unambiguous —
_ConvNd.unmergewas copy-pasted frommergefor the bias branch and never flipped to a div.Worked example
Start with
bias = [1.0, 2.0]and IA3scaling = [3.0, 0.5].[3.0, 1.0][3.0, 1.0][9.0, 0.5][1.0, 2.0]Fix
Mirror the
Linear.unmergepattern: divide byscaling.data + 1e-8(the+ 1e-8guards against any IA3 scaling entry that happens to be zero — same tolerance used on the weight line just above and inLinear.unmerge).if not self.is_feedforward and (base_layer.bias is not None): scaling = self.ia3_l[active_adapter].reshape(base_layer.bias.shape) orig_dtype = base_layer.bias.data.dtype - base_layer.bias.data = torch.mul(base_layer.bias.data, scaling.data).to(orig_dtype) + base_layer.bias.data = torch.div(base_layer.bias.data, scaling.data + 1e-8).to(orig_dtype)One-line change, symmetric with the weight path right above and with
Linear.unmergein the same file. No effect on the feedforward orbias is Nonebranches.