LoRA for Classification #921

UgurKap · 2025-12-05T00:55:42Z

UgurKap
Dec 5, 2025

Hey, I have just implemented the LoRA for classification fine-tuning. In the end, I have noticed the comment about how LoRA is slower, due to the added inference cost but that can be negated on larger models.

My question is, if it makes sense to compare this model against last layer LoRA fine-tuning? Because in classification fine-tuning, we only fine-tuned the layers from the last transformer block onwards and did not touch the first layer weights. I think here, using LoRA only on the last transformer block and the out_head will be fairer.

Here are my results for different experiments (I have a slightly different dataset and training loop, so the numbers differ from the book):

Method	Train	Validation	Test	Training Time
Classification Fine-Tuning (Last Transformer Block Onwards)	99.33%	97.32%	97.33%	0.81 Minutes
LoRA (All Linear Layers)	97.79%	97.32%	94.00%	2.27 Minutes
LoRA (Last Transformer Block Onwards)	96.54%	97.99%	96.00%	1.13 Minutes

I think here, we see the training time cost is not that high when done on the same layers, and also in my experiment with zero hyperparameter tuning, last layer training performed better too (I assumed this might be caused by lora updates to initial layers, causing some "forgetting")

rasbt · 2025-12-19T02:56:04Z

rasbt
Dec 19, 2025
Maintainer

Hi there,

you raise a good point. I think what we define as "fair" depends. I can see too possible scenarios

a) Regular finetuning versus LoRA finetuning, where LoRA targets the same layers as the regular finetuning (all, last, some). Then, one can see whether LoRA, which updates fewer parameters per layer helps or not.

b) Using the method with the highest accuracy (alll, last, or some layers) for each of the two (regular versus LoRA), which one is more efficient?

I think another important factor that we are not considering in this discussion thread is the memory savings. In practice, a common bottle neck is that one cannot do regular training of a 8B model on many single GPUs (depending on the RAM), but LoRA is fine.

All that being said, in the classification case (not in the supervised instruction finetuning case), updating only the last layer like you suggest is reasonable. But again, I think with larger models this becomes totally negligible either way.

0 replies

satishkc7 · 2026-03-24T05:37:26Z

satishkc7
Mar 24, 2026

Your comparison setup is actually sound, and the results make sense given how LoRA interacts with classification fine-tuning.

On the fairness of the comparison: Comparing last-layer fine-tuning against last-layer LoRA is the right call. Applying LoRA to all linear layers introduces extra trainable parameters across the full network, which is a different budget and scope than the baseline - not an apples-to-apples comparison for classification.

Why last-layer fine-tuning wins here: Classification fine-tuning on a pre-trained LLM is a low-rank adaptation problem by nature - you are essentially learning a linear mapping from the frozen representations to class labels. LoRA adds a low-rank decomposition on top of that, which introduces extra approximation error with no benefit when the base weights themselves are not being updated in the layers that matter. With zero hyperparameter tuning, the direct fine-tuning of the final block and out_head is just a simpler, more direct path to the same goal.

On the forgetting observation: You are right that LoRA updates to early layers can cause representational drift. For classification tasks where the pre-trained representations are already strong, touching the early layers adds noise rather than signal. Last-layer LoRA avoids this, which is why it recovers most of the accuracy gap vs. full LoRA.

Practical takeaway: For classification, last-layer LoRA is a reasonable middle ground if you need the LoRA weight-merging benefits (e.g., serving multiple adapters). But if you just want accuracy and speed, direct fine-tuning of the last transformer block onwards is hard to beat - your numbers confirm this well.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LoRA for Classification #921

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

LoRA for Classification #921

Uh oh!

UgurKap Dec 5, 2025

Replies: 2 comments

Uh oh!

Uh oh!

rasbt Dec 19, 2025 Maintainer

Uh oh!

satishkc7 Mar 24, 2026

UgurKap
Dec 5, 2025

rasbt
Dec 19, 2025
Maintainer

satishkc7
Mar 24, 2026