Optimized RVV q1_0 dot by pl752 · Pull Request #31 · PrismML-Eng/llama.cpp

pl752 · 2026-05-04T19:14:52Z

Continuation of #10 for risc-v V extension

Implemented two fixed vlen kernels loosely inspired by AVX2 implementation
VLA causes severe overhead and task only have two realistic VL combinations (in simple form)

Benchmarks were performed with:
OrangePI RV2 sbc (Ky X1 / spacemit k1) 8gb
Armbian Debian trixie rolling release at 6.18.26-current-spacemit kernel
Built with official Spacemit toolchain, but IME wasn't used.
Command: llama-bench -m Bonsai-1.7B.gguf -p 64 -n 16 -t 8 -r 3 -fa 1 -mmp 0
Perplexity for 5x512 chunks: Mean KLD 0.00027, PPL 21.09, Same top p 99,22%

Flow	`pp 64` t/s	`tg 16` t/s	Speedup
Scalar	1.19	0.94	1.0x / 1.0x
`VL128*`	6.08	4.65	5.1x / 4.9x
`VL256`	10.56	7.84	8.9x / 8.3x

* forced VLEN 128 kernel with LMUL=2, for VLEN >= 256: LMUL=1

As always, I would appreciate your feedback

khosravipasha · 2026-05-04T20:13:12Z

Thanks, that's impressive speed on such device :)

Do people need a special setup to build and run this, or the llama.cpp build tools work?

Would be happy to merge it to our fork, don't have a similar device to test it myself though. Will review more closely later this week.

For some reason stopped getting email notifications from Github.

Copilot

Pull request overview

Adds a RISC-V RVV-specific implementation for the q1_0 × q8_0 dot product in the CPU backend, continuing the codebase’s architecture-specific quantized dot-product optimizations.

Changes:

Added two fixed-width RVV kernels for ggml_vec_dot_q1_0_q8_0 targeting 128-bit and 256-bit vector configurations.
Added RVV runtime dispatch in the RISC-V quantized dot-product path.
Updated the RISC-V fallback aliasing so this path can call the true generic implementation when needed.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`ggml/src/ggml-cpu/arch/riscv/quants.c`	Adds the new RVV q1_0×q8_0 kernels, helper tables, and runtime dispatch logic.
`ggml/src/ggml-cpu/arch-fallback.h`	Removes the RISC-V alias for the q1_0 generic dot product so the arch-specific implementation can fall back correctly.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

pl752 · 2026-05-05T06:35:43Z

I don't actually know if llama.cpp accounts for Zvl64b, it seems it's for embedded or 32 bit cores

khosravipasha · 2026-05-05T19:53:32Z

yeah copilot might be confused.

Saw a similar PR in main llama.cpp ggml-org#22500
is that related?

pl752 · 2026-05-06T14:09:52Z

@khosravipasha No this PR is independent from my, but yes it's about this dot prod op too, though implementation there is not very efficient due to it not using vla nor fixed vlen specializzed kernels and relying on LMUL==4, which forces hardware to do 4 op per intruction, also it uses slightly different logic for gathering and masking, I think I need to open my PR too and also try similar approaches.

pl752 · 2026-05-06T14:55:41Z

Turned out I have slept on some free performance, by not noticing better intruction for mask loading:

vlen	run	old	vlm	delta
128	`pp64`	6.08 t/s	10.14 t/s	+66.77%
128	`tg16`	4.65 t/s	7.50 t/s	+61.29%
256	`pp64`	10.56 t/s	13.36 t/s	+26.52%
256	`tg16`	7.84 t/s	9.71 t/s	+23.85%

Perplexity is alright, will prepare PR for mainline

pl752 · 2026-05-07T13:54:42Z

@khosravipasha The thingy got merged, but one question about vneg(qy) vs vsub(0,qy) remained, but now I am busy on x86 repack PR.

pl752 · 2026-05-07T14:19:04Z

Question has resolved itself as there was a confusion about the last comment in the main PR

pl752 added 2 commits May 3, 2026 20:13

RVV Q1_0 1x1 dot vla

667e09b

RVV Q1_0 1x1 dot fixed vl instead of vla

4c0300f

github-actions Bot added the ggml label May 4, 2026

khosravipasha requested a review from Copilot May 4, 2026 20:13

Copilot started reviewing on behalf of khosravipasha May 4, 2026 20:13 View session

Copilot AI reviewed May 4, 2026

View reviewed changes

Comment thread ggml/src/ggml-cpu/arch/riscv/quants.c Outdated

Accounted for VLEN=64 even though min VLEN for V ext is 128

9ec8805

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

pl752 added 2 commits May 6, 2026 19:31

Replaced AVX2 like masking with vlm op

76f6e63

Corrected comments about vlmax

113ed38

pl752 mentioned this pull request May 6, 2026

ggml-cpu: Optimized risc-v cpu q1_0 dot ggml-org/llama.cpp#22768

Merged

pl752 closed this May 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimized RVV q1_0 dot#31

Optimized RVV q1_0 dot#31
pl752 wants to merge 5 commits intoPrismML-Eng:prismfrom
pl752:perf/q1_0_rvv_dot

pl752 commented May 4, 2026

Uh oh!

khosravipasha commented May 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

pl752 commented May 5, 2026

Uh oh!

khosravipasha commented May 5, 2026 •

edited

Loading

Uh oh!

pl752 commented May 6, 2026 •

edited

Loading

Uh oh!

pl752 commented May 6, 2026

Uh oh!

pl752 commented May 7, 2026 •

edited

Loading

Uh oh!

pl752 commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pl752 commented May 4, 2026

Uh oh!

khosravipasha commented May 4, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

pl752 commented May 5, 2026

Uh oh!

khosravipasha commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pl752 commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pl752 commented May 6, 2026

Uh oh!

pl752 commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pl752 commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

khosravipasha commented May 5, 2026 •

edited

Loading

pl752 commented May 6, 2026 •

edited

Loading

pl752 commented May 7, 2026 •

edited

Loading