sparse_strips: Use `store_slice` wherever possible by LaurenzV · Pull Request #1616 · linebender/vello

LaurenzV · 2026-05-03T20:28:04Z

A while ago, we added the store_slice method to fearless_simd since we realized that using copy_from_slice doesn't always turn into efficient code on x86. Therefore, using this code in vello should lead to better performance, at the very least not regress it. I measured this on NEON and I'm not seeing any regressions.

grebmeg · 2026-05-07T03:46:50Z

I believe this is the PR: linebender/fearless_simd#181?

grebmeg

I haven’t dug into store_slice much, but could you elaborate on why it might be more efficient on x86? Also, if NEON performance looks good, could we benchmark it on other platforms as well? I’m a bit hesitant to approve this without stronger evidence or proof.

LaurenzV · 2026-05-07T06:42:12Z

Fair enough! The reason why we determined it to be slower is that copy_from_slice often didn't optimize to using the best store intrinsics for the given level. It's been a while, but there is some additional discussion in smu160/PhastFT#58 and linebender/fearless_simd#185. Anyway, it's a fair concern, and I will try whether I can pull up the old vello bench repo to get the timings on x86 as well as WASM.

LaurenzV · 2026-05-08T09:16:34Z

@grebmeg Here are my results from running in Chrome using WASM, no changes observed:

LaurenzV · 2026-05-08T09:25:53Z

Same for raw NEON.

Will try to run the benchmarks on my AVX2 laptop now.

LaurenzV · 2026-05-08T10:22:16Z

Hmm, so to be honest, I wasn't really able to measure a speed boost on AVX2. Somtimes it's a bit faster, sometimes a bit slower, but it mostly seems like noise (see below).

Anyway, since there at least don't seem to be any regressions, I would personally still be in favor of merging this, since fearless_simd provides an API for storing vectors now, it's better to use it than just hope that copy_from_slice optimizes as we hope it does. But up to you, just let me know how you feel!

fine/fill/opaque_short_u8_avx2
                        time:   [6.5817 ns 6.9296 ns 7.2834 ns]
                        change: [-8.8687% -3.6346% +1.8572%] (p = 0.19 > 0.05)
                        No change in performance detected.
 
fine/fill/opaque_long_u8_avx2
                        time:   [22.981 ns 26.268 ns 29.743 ns]
                        change: [-18.986% -7.9699% +4.7343%] (p = 0.22 > 0.05)
                        No change in performance detected.
 
fine/fill/transparent_short_u8_avx2
                        time:   [13.110 ns 13.139 ns 13.170 ns]
                        change: [-2.6401% -1.8585% -1.1454%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe
 
fine/fill/transparent_long_u8_avx2
                        time:   [96.356 ns 96.650 ns 96.951 ns]
                        change: [-2.0641% -1.2450% -0.5130%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
 
fine/strip/solid_short_u8_avx2
                        time:   [11.733 ns 11.765 ns 11.801 ns]
                        change: [-0.6357% -0.1145% +0.4128%] (p = 0.68 > 0.05)
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
 
fine/strip/solid_long_u8_avx2
                        time:   [78.872 ns 79.010 ns 79.163 ns]
                        change: [-1.9532% -1.4405% -0.9604%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild
 
fine/pack/pack_block_u8_avx2
                        time:   [90.445 ns 90.575 ns 90.726 ns]
                        change: [-1.2955% -0.8414% -0.4573%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
 
fine/pack/pack_regular_u8_avx2
                        time:   [134.62 ns 134.95 ns 135.29 ns]
                        change: [-0.7959% -0.4100% -0.0110%] (p = 0.04 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe
 
fine/pack/unpack_block_u8_avx2
                        time:   [91.827 ns 92.035 ns 92.244 ns]
                        change: [+0.4345% +0.8964% +1.4966%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe
 
fine/pack/unpack_regular_u8_avx2
                        time:   [176.34 ns 176.80 ns 177.24 ns]
                        change: [-2.9782% -2.1356% -0.7432%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
 
fine/gradient/linear/opaque_u8_avx2
                        time:   [415.74 ns 441.66 ns 472.66 ns]
                        change: [+3.0365% +7.0960% +12.665%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  7 (7.00%) high severe
 
fine/gradient/radial/opaque_u8_avx2
                        time:   [565.00 ns 577.77 ns 599.93 ns]
                        change: [-3.3081% +0.5814% +4.6626%] (p = 0.77 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe
 
fine/gradient/radial/opaque_conical_u8_avx2
                        time:   [655.38 ns 689.39 ns 725.05 ns]
                        change: [+2.4261% +7.3872% +12.451%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 21 outliers among 100 measurements (21.00%)
  20 (20.00%) high mild
  1 (1.00%) high severe
 
fine/gradient/sweep/opaque_u8_avx2
                        time:   [897.99 ns 917.25 ns 940.90 ns]
                        change: [+0.8496% +3.6072% +6.4875%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 19 outliers among 100 measurements (19.00%)
  6 (6.00%) high mild
  13 (13.00%) high severe
 
fine/gradient/extend/pad_u8_avx2
                        time:   [415.68 ns 435.54 ns 461.08 ns]
                        change: [-3.5682% +2.0580% +8.4721%] (p = 0.50 > 0.05)
                        No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) high mild
  7 (7.00%) high severe
 
fine/gradient/extend/repeat_u8_avx2
                        time:   [494.69 ns 512.33 ns 536.46 ns]
                        change: [-1.4447% +3.0115% +7.5000%] (p = 0.20 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) high mild
  5 (5.00%) high severe
 
fine/gradient/extend/reflect_u8_avx2
                        time:   [562.51 ns 574.79 ns 591.80 ns]
                        change: [-6.4356% -2.2666% +2.1999%] (p = 0.31 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) high mild
  5 (5.00%) high severe
 
fine/gradient/many_stops_u8_avx2
                        time:   [762.38 ns 767.54 ns 773.92 ns]
                        change: [-0.8385% +0.0220% +1.0074%] (p = 0.96 > 0.05)
                        No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) high mild
  8 (8.00%) high severe
 
fine/gradient/transparent_u8_avx2
                        time:   [673.63 ns 675.59 ns 677.69 ns]
                        change: [-2.0569% -1.1367% +0.0059%] (p = 0.03 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe
 
fine/image/quality/low_u8_avx2
                        time:   [457.79 ns 459.96 ns 462.80 ns]
                        change: [-0.7154% -0.1406% +0.4472%] (p = 0.66 > 0.05)
                        No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe
 
fine/image/quality/medium_u8_avx2
                        time:   [2.6238 µs 2.6434 µs 2.6673 µs]
                        change: [+1.1172% +1.7327% +2.4433%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  8 (8.00%) high severe
 
fine/image/quality/high_u8_avx2
                        time:   [90.028 µs 90.370 µs 90.764 µs]
                        change: [+0.5259% +1.0082% +1.5054%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  7 (7.00%) high mild
  1 (1.00%) high severe

grebmeg

Thanks for the measurements, @LaurenzV! it's good to see there's no perf regression. I think I roughly see the minor benefits here, but I'm still a bit skeptical. That said, I likely just don't have clear picture for this yet, so happy to defer to your intuition here.

Use store_slice

b76724e

LaurenzV requested a review from grebmeg May 3, 2026 20:28

grebmeg reviewed May 7, 2026

View reviewed changes

LaurenzV requested a review from grebmeg May 8, 2026 10:22

grebmeg approved these changes May 10, 2026

View reviewed changes

LaurenzV added this pull request to the merge queue May 11, 2026

Merged via the queue into main with commit 958cd19 May 11, 2026
17 checks passed

LaurenzV deleted the laurenz/store branch May 11, 2026 09:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sparse_strips: Use `store_slice` wherever possible#1616

sparse_strips: Use `store_slice` wherever possible#1616
LaurenzV merged 1 commit into
mainfrom
laurenz/store

LaurenzV commented May 3, 2026

Uh oh!

grebmeg commented May 7, 2026

Uh oh!

grebmeg left a comment

Uh oh!

LaurenzV commented May 7, 2026 •

edited

Loading

Uh oh!

LaurenzV commented May 8, 2026

Uh oh!

LaurenzV commented May 8, 2026

Uh oh!

LaurenzV commented May 8, 2026 •

edited

Loading

Uh oh!

grebmeg left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

LaurenzV commented May 3, 2026

Uh oh!

grebmeg commented May 7, 2026

Uh oh!

grebmeg left a comment

Choose a reason for hiding this comment

Uh oh!

LaurenzV commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LaurenzV commented May 8, 2026

Uh oh!

LaurenzV commented May 8, 2026

Uh oh!

LaurenzV commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

grebmeg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

LaurenzV commented May 7, 2026 •

edited

Loading

LaurenzV commented May 8, 2026 •

edited

Loading