Skip to content

feat(lora): Add FIM-guided adaptive LoRA rank allocation (FimConfig + initialize_lora_fim_ranks)#3204

Draft
ramkrishs wants to merge 1 commit into
huggingface:mainfrom
ramkrishs:feat/fim-adaptive-lora-rank
Draft

feat(lora): Add FIM-guided adaptive LoRA rank allocation (FimConfig + initialize_lora_fim_ranks)#3204
ramkrishs wants to merge 1 commit into
huggingface:mainfrom
ramkrishs:feat/fim-adaptive-lora-rank

Conversation

@ramkrishs
Copy link
Copy Markdown

Summary

Adds FimConfig and initialize_lora_fim_ranks() — a data-driven method that redistributes LoRA ranks across layers using the diagonal of the empirical Fisher Information Matrix (eFIM), concentrating rank budget on layers that are most sensitive to the loss.

Proposal issue: #3203


Motivation

LoRA uses a fixed rank r for all adapter matrices. Different layers have different sensitivity to fine-tuning data — early attention layers often require less capacity than later layers; q/v projections often differ from k projections. A fixed-rank allocation wastes capacity on insensitive layers.

EVA (already in PEFT) addresses this via SVD of input activations. This PR uses a complementary signal: the eFIM diagonal (mean squared gradient), which directly measures per-parameter loss sensitivity rather than activation variance. The two are orthogonal — EVA optimizes initialization directions; FIM optimizes rank allocation by sensitivity.


Algorithm

The eFIM diagonal for parameter θᵢ:

F_ii ≈ (1/T) Σ (∂ℓ_t / ∂θ_i)²

Rank allocation:

score_i = mean(F_i)          # per-layer importance
rank_i  ∝ score_i / Σ score_j × budget    # proportional, budget = n_layers × r
rank_i  = clamp(rank_i, r_min, r_max)     # integer, largest-remainder rounding

Total rank budget is preserved: mean rank across layers equals the original r.


API

Follows the EVA pattern exactly: FimConfig dataclass + initialize_lora_fim_ranks() public function + init_lora_weights='fim' trigger in LoraConfig.

from peft import LoraConfig, get_peft_model, FimConfig, initialize_lora_fim_ranks

fim_cfg = FimConfig(
    fim_calibration_batches=8,   # batches for eFIM accumulation
    r_min=1,                     # minimum rank per layer
    r_max=32,                    # maximum rank per layer (default: 2 * r)
    adjust_scaling_factors=True, # preserve lora_alpha / r after reallocation
)

config = LoraConfig(
    r=8,
    init_lora_weights="fim",
    fim_config=fim_cfg,
    target_modules=["q_proj", "v_proj"],
)
model = get_peft_model(base_model, config)

initialize_lora_fim_ranks(model, dataloader=calibration_loader)
# or, with pre-computed scores:
initialize_lora_fim_ranks(model, fim_scores=my_fim_dict)

Files changed

File Change
src/peft/tuners/lora/fim.py New: FimConfig, initialize_lora_fim_ranks, internal helpers
src/peft/tuners/lora/config.py Add fim_config field, 'fim' to init_lora_weights Literal, validation
src/peft/tuners/lora/layer.py Allow 'fim' in reset_lora_parameters (treated as standard init; rank redistribution happens post-construction)
src/peft/tuners/lora/__init__.py Export FimConfig, initialize_lora_fim_ranks
src/peft/tuners/__init__.py Propagate exports
src/peft/__init__.py Top-level export
tests/test_lora_fim.py 23 unit tests

Tests

23 passed in 5.00s

Covers: FimConfig construction/validation, _compute_layer_importance, _allocate_ranks (budget preservation, clamping, monotonicity), _resize_lora_layer (increase/decrease/noop, scaling adjustment), initialize_lora_fim_ranks end-to-end (with dataloader and pre-computed scores), LoraConfig validation warnings, and top-level import.

No GPU required. All tests run on CPU.


Reference

  • LeCun et al., Optimal Brain Damage, NeurIPS 1990 — theoretical basis for eFIM diagonal importance
  • Zhang et al., AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning, ICLR 2023 — related adaptive rank via SVD
  • Related: pytorch/ao #4352 — same eFIM diagonal applied to weight pruning

This PR was developed with AI assistance. All code has been tested, manually reviewed, and verified against the PEFT codebase conventions.

Introduces FimConfig and initialize_lora_fim_ranks() — a calibration-based
method that redistributes LoRA ranks across layers using the diagonal of the
empirical Fisher Information Matrix (eFIM), subject to a global rank budget.

Layers with high gradient variance (high eFIM score) receive higher rank;
layers with low sensitivity receive lower rank. This allows the same total
parameter count as fixed-rank LoRA, while concentrating capacity where the
loss curvature is highest.

Algorithm:
  F_ii ≈ (1/T) Σ (∂ℓ_t/∂θ_i)²  (eFIM diagonal, mean squared gradient)
  rank_i ∝ mean(F_i) / Σ mean(F_j) × budget,  clamped to [r_min, r_max]
  budget = n_layers × r  (mean rank preserved)

Files changed:
  src/peft/tuners/lora/fim.py        — FimConfig + initialize_lora_fim_ranks
  src/peft/tuners/lora/config.py     — fim_config field + 'fim' init mode
  src/peft/tuners/lora/layer.py      — allow 'fim' in reset_lora_parameters
  src/peft/tuners/lora/__init__.py   — export FimConfig, initialize_lora_fim_ranks
  src/peft/tuners/__init__.py        — propagate exports
  src/peft/__init__.py               — top-level export
  tests/test_lora_fim.py             — 23 unit tests

Relates to: huggingface#3203

Reference: LeCun et al., Optimal Brain Damage, NeurIPS 1990.

Signed-off-by: Ramakrishnan Sathyavageeswaran <ramkrishs@outlook.com>
@BenjaminBossan
Copy link
Copy Markdown
Member

Thanks for providing this PR @ramkrishs. Is there any paper that shows that this initialization works well with LoRA? Did you run any of your own experiments? Usually, we don't add new methods to PEFT only on the theoretical assumption that they could work.

@ramkrishs
Copy link
Copy Markdown
Author

Thank you for the feedback Benjamin.

I'm currently running a structured comparison on GLUE (DeBERTaV3-base) and commonsense reasoning (LLaMA-3-8B) against LoRA, AdaLoRA, and EVA across rank budgets r ∈ {2, 4, 8, 16}. The experiment harness is set up and the first results should be ready within 2–3 weeks. This work is also being written up as a short paper — the closest prior work (AdaLoRA, ICLR 2023) shows that non-uniform rank allocation consistently outperforms fixed-rank LoRA, particularly at low budgets, and our hypothesis is that eFIM-based allocation is more directly tied to the fine-tuning objective than SVD-based signals.

I'll update this PR with the results table and a link to the arXiv preprint once the experiments are complete. Happy to keep this as a draft in the meantime — no action needed from your side until then.

@BenjaminBossan
Copy link
Copy Markdown
Member

I'll update this PR with the results table and a link to the arXiv preprint once the experiments are complete.

Thanks, then let's pick this PR up again at that point.

You could also check this init on the PEFT MetaMath benchmark, it'll probably just require a couple of lines of extra code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants