Skip to content

FIX IA3 _ConvNd.unmerge bias uses torch.mul instead of torch.div#3149

Open
Chessing234 wants to merge 2 commits into
huggingface:mainfrom
Chessing234:fix/ia3-convnd-unmerge-bias-div
Open

FIX IA3 _ConvNd.unmerge bias uses torch.mul instead of torch.div#3149
Chessing234 wants to merge 2 commits into
huggingface:mainfrom
Chessing234:fix/ia3-convnd-unmerge-bias-div

Conversation

@Chessing234
Copy link
Copy Markdown
Contributor

@Chessing234 Chessing234 commented Apr 11, 2026

Bug

_ConvNd.unmerge in src/peft/tuners/ia3/layer.py does not undo the bias scaling that _ConvNd.merge applied. After a merge() -> unmerge() round-trip on an IA3 non-feedforward Conv2d/Conv3d layer whose base layer has bias is not None, the base layer's bias ends up multiplied by ia3_l ** 2 instead of being restored to its pre-merge value, and each subsequent disable_adapters-triggered unmerge compounds the error.

Root cause

In _ConvNd.merge (line 254–256) the bias is scaled with torch.mul:

if not self.is_feedforward and (base_layer.bias is not None):
    scaling = self.ia3_l[active_adapter].reshape(base_layer.bias.shape)
    base_layer.bias.data = torch.mul(base_layer.bias.data, scaling.data).to(orig_dtype)

and in _ConvNd.unmerge (line 280–283) the bias is "un-scaled" by also using torch.mul — exactly the same op:

if not self.is_feedforward and (base_layer.bias is not None):
    scaling = self.ia3_l[active_adapter].reshape(base_layer.bias.shape)
    orig_dtype = base_layer.bias.data.dtype
    base_layer.bias.data = torch.mul(base_layer.bias.data, scaling.data).to(orig_dtype)

So bias_merged = bias * ia3_l and then bias_unmerged = bias_merged * ia3_l = bias * ia3_l ** 2, rather than bias.

The corresponding weight path in the same function (line 278) uses torch.div correctly:

base_layer.weight.data = torch.div(base_layer.weight.data, ia3_scaling + 1e-8).to(orig_dtype)

and the sibling Linear.unmerge bias path at line 152–155 also does it correctly:

if not self.is_feedforward and (base_layer.bias is not None):
    scaling = self.ia3_l[active_adapter].reshape(base_layer.bias.shape)
    orig_dtype = base_layer.bias.data.dtype
    base_layer.bias.data = torch.div(base_layer.bias.data, scaling.data + 1e-8).to(orig_dtype)

so the intended shape of the fix is unambiguous — _ConvNd.unmerge was copy-pasted from merge for the bias branch and never flipped to a div.

Worked example

Start with bias = [1.0, 2.0] and IA3 scaling = [3.0, 0.5].

step current code expected
after merge [3.0, 1.0] [3.0, 1.0]
after unmerge [9.0, 0.5] [1.0, 2.0]

Fix

Mirror the Linear.unmerge pattern: divide by scaling.data + 1e-8 (the + 1e-8 guards against any IA3 scaling entry that happens to be zero — same tolerance used on the weight line just above and in Linear.unmerge).

                 if not self.is_feedforward and (base_layer.bias is not None):
                     scaling = self.ia3_l[active_adapter].reshape(base_layer.bias.shape)
                     orig_dtype = base_layer.bias.data.dtype
-                    base_layer.bias.data = torch.mul(base_layer.bias.data, scaling.data).to(orig_dtype)
+                    base_layer.bias.data = torch.div(base_layer.bias.data, scaling.data + 1e-8).to(orig_dtype)

One-line change, symmetric with the weight path right above and with Linear.unmerge in the same file. No effect on the feedforward or bias is None branches.

\`_ConvNd.unmerge\` in \`src/peft/tuners/ia3/layer.py\` is supposed to
undo the corresponding \`_ConvNd.merge\` by inverting the (IA)^3
scaling applied to the base layer's weight **and** bias. The weight
path correctly divides by the IA3 scaling (line 278):

    base_layer.weight.data = torch.div(
        base_layer.weight.data, ia3_scaling + 1e-8
    ).to(orig_dtype)

but the bias path right below it accidentally uses \`torch.mul\`
again, the same op \`merge\` uses:

    # merge (line 256)
    base_layer.bias.data = torch.mul(base_layer.bias.data, scaling.data).to(orig_dtype)
    ...
    # unmerge (line 283)  <-- BUG: should be torch.div
    base_layer.bias.data = torch.mul(base_layer.bias.data, scaling.data).to(orig_dtype)

So a merge followed by an unmerge leaves the bias scaled by
\`ia3_l ** 2\` instead of being restored to its original value. For
any non-feedforward IA3 Conv layer (Conv2d / Conv3d) whose base
layer has \`bias is not None\`, \`merge() -> unmerge()\` permanently
corrupts the bias — and subsequent \`disable_adapters\`-triggered
unmerges compound the error.

The sibling \`Linear.unmerge\` in the same file (line 155) already
uses the correct pattern:

    base_layer.bias.data = torch.div(
        base_layer.bias.data, scaling.data + 1e-8
    ).to(orig_dtype)

Mirror that pattern in \`_ConvNd.unmerge\`: divide by
\`scaling.data + 1e-8\` so the bias unmerge is the inverse of the
bias merge, and the \`+ 1e-8\` tolerance guards against division by
any IA3 scaling entry that happens to be zero.
Copy link
Copy Markdown
Collaborator

@githubnemo githubnemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, nice find.

Can you also reduce the tolerance in test_custom_models.py for test_merge? (to verify that the fix works)

--- a/tests/testing_common.py
+++ b/tests/testing_common.py
@@ -608,7 +608,7 @@ class PeftCommonTester:
                 atol, rtol = 1e-3, 1e-3
             if (config.peft_type in {"IA3", "LORA", "OFT"}) and (model_id in conv_ids):
                 # for some reason, the Conv introduces a larger error
-                atol, rtol = 0.3, 0.01
+                atol, rtol = 1e-2, 0.01
             if (config.peft_type == "OFT") and not is_decoder:
                 atol, rtol = 0.3, 0.01
             if model_id == "trl-internal-testing/tiny-Llama4ForCausalLM":

It would also be nice if you'd found out why test_safe_merge doesn't detect this issue.

@github-actions
Copy link
Copy Markdown

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@github-actions github-actions Bot closed this May 19, 2026
@githubnemo githubnemo reopened this May 19, 2026
@githubnemo
Copy link
Copy Markdown
Collaborator

gentle ping @Chessing234 - do you still plan on working on this?

Drop atol from 0.3 to 1e-2 (rtol unchanged at 0.01) on the
IA3/LORA/OFT + Conv branch in _test_merge_layers. The previous 0.3
tolerance was loose enough to hide the IA3 _ConvNd.unmerge bias bug
this PR fixes; with the fix, all 126 Conv2d/Conv3d merge cases pass
under the tighter bound, and reverting the fix makes the affected
IA3-Conv2d cases fail.
@Chessing234
Copy link
Copy Markdown
Contributor Author

Hey @githubnemo, sorry for the silence — yes, still on it.

Just pushed the tolerance change you suggested (atol=1e-2, rtol=0.01) on the IA3/LORA/OFT Conv branch in tests/testing_common.py. Verified locally: with the fix in place, all 126 Conv2d/Conv3d test_merge_layers* cases pass under the tighter bound; reverting the IA3 unmerge bias line back to torch.mul makes 3 of the IA3-Conv2d cases fail — so the tighter tolerance does catch this specific bug now.

On test_safe_merge not detecting it: _test_safe_merge only calls model.merge_and_unload(safe_merge=True) once and compares the unloaded model's logits against the PEFT model's. It never calls unmerge_adapter(). Since the bug lives purely in _ConvNd.unmerge, and safe_merge=True only adds a NaN/Inf guard on the merge path, this test never exercises the buggy code. _test_merge_layers is the right place — it does the full merge → unmerge round-trip — it just wasn't tight enough before. Confirmed by running test_safe_merge[Conv2d * IA3] with the bug reverted: still passes 5/5.

Happy to also add an explicit merge/unmerge round-trip assertion to _test_safe_merge if you'd like that as a separate follow-up, but it felt out of scope here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants