Skip to content

[Model] Implement LoRA support for Qwen3ASRForConditionalGeneration#37247

Merged
vadiklyutiy merged 10 commits intovllm-project:mainfrom
petern48:lora_Qwen3ASRForConditionalGeneration
Apr 10, 2026
Merged

[Model] Implement LoRA support for Qwen3ASRForConditionalGeneration#37247
vadiklyutiy merged 10 commits intovllm-project:mainfrom
petern48:lora_Qwen3ASRForConditionalGeneration

Conversation

@petern48
Copy link
Copy Markdown
Contributor

@petern48 petern48 commented Mar 17, 2026

Purpose

This PR adds LoRA support for Qwen3ASRForConditionalGeneration model.

For this to work for the audio tower, I had to make a few additional changes:

  • Implement get_num_mm_encoder_tokens()
  • Replaced some nn.Linears with ReplicatedLinear along the audio tower path.
  • Qwen3ASR seems to be our first model with a tower, but no connector. In gpu_model_runner.py, I found that the hasattr(self.model, "get_num_mm_connector_tokens") was improperly evaluating to True due to inheritance, despite the model not implementing get_num_mm_connector_tokens(). This was leading us to incorrectly go down that path and encounter an error. I've modified the condition to check if connector actually exists in the mapping.

Fixes #37223

Test Plan

I tested on the following public adaptor available on HuggingFace: ha0yuan/Qwen3-ASR-LoRa-ChineseAviation-Tiny.

vllm serve Qwen/Qwen3-ASR-1.7B \
  --enable-lora \
  --enable-tower-connector-lora \
  --lora-modules aviation=ha0yuan/Qwen3-ASR-LoRa-ChineseAviation-Tiny \
  --port 8000

I also double-checked that the adapters are properly shown when querying the /v1/models endpoint:

curl localhost:8000/v1/models | jq .
{
  "object": "list",
  "data": [
    {
      "id": "Qwen/Qwen3-ASR-1.7B",
      "object": "model",
      ...
      "root": "Qwen/Qwen3-ASR-1.7B",
      ...
    },
    {
      "id": "medpl",
       ...
      "root": "AleksanderObuchowski/Qwen3-ASR-1.7B-med-pl-lora-decoder-only",
      ...
    }
}

Then I used a Python script to load in a .wav file and query the /v1/chat/completions endpoint. Specifically, I used this audio file as input.

Test Result

Before this PR, the server would error with
ValueError: Qwen3ASRForConditionalGeneration does not support LoRA yet.

After this change, the server starts up properly, and I successfully queried the /v1/chat/completions endpoint.

Querying both the raw model and the adaptor, I verified that the output differs when the LoRa adaptor is enabled. The outputs are below. (Notice, the raw model transcribes to numbers (e.g 9 and 10), while the adaptor transcribes the numbers to words ("nine" and "ten).

== no LoRA ==
language English<asr_text>November the 10th, Wednesday, 9 p.m. I'm standing in a dark alley. After waiting several hours, the time has come. A woman with long dark hair approaches. I have to act, and fast, before she realizes what has happened. I must find out.

== with LoRA ==
language English<asr_text>November the tenth, Wednesday, nine p.m. I'm standing in a dark alley. After waiting several hours, the time has come. A woman with long dark hair approaches. I have to act, and fast, before she realizes what has happened. I must find out.

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Peter Nguyen <petern0408@gmail.com>
@mergify mergify Bot added the qwen Related to Qwen models label Mar 17, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds LoRA support for the Qwen3ASRForConditionalGeneration model. The changes introduce the SupportsLoRA interface and define the necessary attributes (packed_modules_mapping, embedding_modules, lora_skip_prefixes) to correctly apply LoRA adapters to the language model portion of the model, while skipping the audio tower. My review of these changes did not identify any issues.

Signed-off-by: Peter Nguyen <petern0408@gmail.com>
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Mar 17, 2026

Documentation preview: https://vllm--37247.org.readthedocs.build/en/37247/

@mergify mergify Bot added the documentation Improvements or additions to documentation label Mar 17, 2026
@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

@petern48 petern48 marked this pull request as ready for review March 17, 2026 03:13
@petern48 petern48 requested a review from sighingnow as a code owner March 17, 2026 03:13
@Isotr0py Isotr0py requested a review from jeejeelee March 17, 2026 06:00
Comment thread vllm/model_executor/models/qwen3_asr.py Outdated
Comment thread vllm/model_executor/models/qwen3_asr.py Outdated
Signed-off-by: Peter Nguyen <petern0408@gmail.com>
Signed-off-by: Peter Nguyen <petern0408@gmail.com>
Signed-off-by: Peter Nguyen <petern0408@gmail.com>
Previously, the hasattr(self.model, "get_num_mm_connector_tokens") condition would evaluate to True for this case due to inheritance, despite the method not being overrided.

Signed-off-by: Peter Nguyen <petern0408@gmail.com>
@petern48 petern48 requested a review from njhill as a code owner March 17, 2026 19:11
@mergify mergify Bot added the v1 label Mar 17, 2026
@petern48 petern48 requested a review from jeejeelee March 18, 2026 23:11
@jeejeelee
Copy link
Copy Markdown
Collaborator

Have you tested this PR with the real LoRA adapter?

@petern48
Copy link
Copy Markdown
Contributor Author

@jeejeelee Yes, I have. I listed the exact public adapter I used in the PR description. I also just linked the data I used and the text output it generated, verifying that querying the adapter successfully leads to output that is different from the raw base model.

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Mar 24, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @petern48.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Mar 24, 2026
Signed-off-by: Peter Nguyen <petern0408@gmail.com>
@jeejeelee
Copy link
Copy Markdown
Collaborator

Please fix the pre-commit failure, thank you

@petern48
Copy link
Copy Markdown
Contributor Author

Error: PR must have the 'ready' label or the author must have at least 4 merged PRs (found 0).

@jeejeelee I think you just need to add the ready label. #37544 introduced a change that prevents pre-commit from running by default. I have the pre-commit setup locally, so it should already be formatted correctly. It was also already passing before, and all I've done since then is merge with main. Could you add it, please?

@DarkLight1337 DarkLight1337 added verified Run pre-commit for new contributors without triggering other tests ready ONLY add when PR is ready to merge/full CI is needed and removed verified Run pre-commit for new contributors without triggering other tests labels Apr 9, 2026
@petern48 petern48 requested a review from vadiklyutiy as a code owner April 9, 2026 18:03
@vadiklyutiy vadiklyutiy merged commit 8d0f908 into vllm-project:main Apr 10, 2026
64 checks passed
@petern48 petern48 deleted the lora_Qwen3ASRForConditionalGeneration branch April 10, 2026 15:21
wojciech-wais pushed a commit to wojciech-wais/vllm that referenced this pull request Apr 13, 2026
whk-lab pushed a commit to whk-lab/vllm that referenced this pull request Apr 23, 2026
avinashsingh77 pushed a commit to avinashsingh77/vllm that referenced this pull request Apr 27, 2026
…llm-project#37247)

Signed-off-by: Peter Nguyen <petern0408@gmail.com>
Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Add LoRA support for Qwen3ASRForConditionalGeneration

4 participants