Skip to content

Fix DeloraLinear.unmerge subtracting un-cast delta weight#3166

Open
Chessing234 wants to merge 1 commit into
huggingface:mainfrom
Chessing234:fix/delora-unmerge-cast-delta
Open

Fix DeloraLinear.unmerge subtracting un-cast delta weight#3166
Chessing234 wants to merge 1 commit into
huggingface:mainfrom
Chessing234:fix/delora-unmerge-cast-delta

Conversation

@Chessing234
Copy link
Copy Markdown
Contributor

Bug

`DeloraLinear.merge` converts the delta to the base layer's dtype / device before applying it (lines 191-195):

```python
delta_weight = (
self.get_delta_weight(active_adapter)
.detach()
.to(dtype=base_layer.weight.dtype, device=base_layer.weight.device)
)
...
base_layer.weight.data.add_(delta_weight)
```

`unmerge` omits the same conversion:

```python
self.get_base_layer().weight.data -= self.get_delta_weight(active_adapter)
```

Root cause

`get_delta_weight` returns the output of `_compute_delta`, whose dtype/device follow the adapter parameters (`delora_A`/`delora_B`/`delora_lambda`/`delora_w_norm`), not the base weight. When the base layer is in a different dtype (e.g. bf16/int8 after `prepare_model_for_kbit_training`) or on a different device than the adapter, the in-place `-=` either raises a dtype/device mismatch or silently round-trips the residual differently from how it was added — so `unmerge` no longer reverses `merge`.

Fix

Mirror the merge path: cast the delta to the base layer's dtype/device before subtracting. Behavior is unchanged when the adapter and base already share dtype/device.

DeloraLinear.merge converts the delta to the base layer's dtype/device
before applying it (lines 191-195):

    delta_weight = (
        self.get_delta_weight(active_adapter)
        .detach()
        .to(dtype=base_layer.weight.dtype, device=base_layer.weight.device)
    )
    base_layer.weight.data.add_(delta_weight)

unmerge omits the same conversion:

    self.get_base_layer().weight.data -= self.get_delta_weight(active_adapter)

`get_delta_weight` returns the result of `_compute_delta`, whose
dtype/device follow the adapter parameters (delora_A/B/lambda/w_norm),
not the base weight. When the base layer is e.g. bf16 / int8 / on a
different device than the adapter (a common case after
prepare_model_for_kbit_training, FSDP, or DoRA-style mixed setups), the
in-place `-=` either raises a dtype/device mismatch or silently
casts/round-trips the residual incorrectly, so unmerge no longer
reverses merge.

Mirror the merge path: cast the delta to the base layer's dtype/device
before subtracting.
@BenjaminBossan
Copy link
Copy Markdown
Member

Thanks for the PR.

or silently round-trips the residual differently from how it was added

What do you mean by that?

so unmerge no longer reverses merge.

IIUC, due to floating point operations, that can never be avoided, even with explicit type promotion.

@github-actions
Copy link
Copy Markdown

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants