Skip to content

fix: use multi-GPU RWKV strategy from visible CUDA devices#3882

Open
Chessing234 wants to merge 1 commit into
lm-sys:mainfrom
Chessing234:fix/rwkv-multi-gpu-strategy
Open

fix: use multi-GPU RWKV strategy from visible CUDA devices#3882
Chessing234 wants to merge 1 commit into
lm-sys:mainfrom
Chessing234:fix/rwkv-multi-gpu-strategy

Conversation

@Chessing234
Copy link
Copy Markdown

Fixes #1248

Bug

Launching an RWKV model worker with --num-gpus / --gpus still loads with strategy="cuda fp16", so RWKV ignores the requested GPU list and runs on a single device.

Root cause

RwkvModel.__init__ hardcodes strategy="cuda fp16" instead of building a per-device strategy string for multiple visible GPUs.

Why this fix is correct

When more than one CUDA device is visible (including after CUDA_VISIBLE_DEVICES is set by the worker), build cuda:0 fp16 -> cuda:1 fp16 -> ... as RWKV expects. Single-GPU behavior stays cuda fp16.

Made with Cursor

Use multi-GPU RWKV strategy when more than one GPU is visible via
--gpus / --num-gpus instead of hardcoding single-device cuda fp16.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] RWKV Models are not configured to use Cuda GPU lists.

1 participant