[Model] Implement LoRA support for Qwen3ASRForConditionalGeneration#37247
Conversation
Signed-off-by: Peter Nguyen <petern0408@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request adds LoRA support for the Qwen3ASRForConditionalGeneration model. The changes introduce the SupportsLoRA interface and define the necessary attributes (packed_modules_mapping, embedding_modules, lora_skip_prefixes) to correctly apply LoRA adapters to the language model portion of the model, while skipping the audio tower. My review of these changes did not identify any issues.
Signed-off-by: Peter Nguyen <petern0408@gmail.com>
|
Documentation preview: https://vllm--37247.org.readthedocs.build/en/37247/ |
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
Signed-off-by: Peter Nguyen <petern0408@gmail.com>
Signed-off-by: Peter Nguyen <petern0408@gmail.com>
Signed-off-by: Peter Nguyen <petern0408@gmail.com>
Previously, the hasattr(self.model, "get_num_mm_connector_tokens") condition would evaluate to True for this case due to inheritance, despite the method not being overrided. Signed-off-by: Peter Nguyen <petern0408@gmail.com>
|
Have you tested this PR with the real LoRA adapter? |
|
@jeejeelee Yes, I have. I listed the exact public adapter I used in the PR description. I also just linked the data I used and the text output it generated, verifying that querying the adapter successfully leads to output that is different from the raw base model. |
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Peter Nguyen <petern0408@gmail.com>
|
Please fix the pre-commit failure, thank you |
@jeejeelee I think you just need to add the |
…llm-project#37247) Signed-off-by: Peter Nguyen <petern0408@gmail.com>
…llm-project#37247) Signed-off-by: Peter Nguyen <petern0408@gmail.com>
…llm-project#37247) Signed-off-by: Peter Nguyen <petern0408@gmail.com> Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>
Purpose
This PR adds LoRA support for
Qwen3ASRForConditionalGenerationmodel.For this to work for the audio tower, I had to make a few additional changes:
get_num_mm_encoder_tokens()nn.Linears withReplicatedLinearalong the audio tower path.gpu_model_runner.py, I found that thehasattr(self.model, "get_num_mm_connector_tokens")was improperly evaluating to True due to inheritance, despite the model not implementingget_num_mm_connector_tokens(). This was leading us to incorrectly go down that path and encounter an error. I've modified the condition to check ifconnectoractually exists in the mapping.Fixes #37223
Test Plan
I tested on the following public adaptor available on HuggingFace: ha0yuan/Qwen3-ASR-LoRa-ChineseAviation-Tiny.
I also double-checked that the adapters are properly shown when querying the
/v1/modelsendpoint:Then I used a Python script to load in a
.wavfile and query the/v1/chat/completionsendpoint. Specifically, I used this audio file as input.Test Result
Before this PR, the server would error with
ValueError: Qwen3ASRForConditionalGeneration does not support LoRA yet.After this change, the server starts up properly, and I successfully queried the
/v1/chat/completionsendpoint.Querying both the raw model and the adaptor, I verified that the output differs when the LoRa adaptor is enabled. The outputs are below. (Notice, the raw model transcribes to numbers (e.g 9 and 10), while the adaptor transcribes the numbers to words ("nine" and "ten).
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.