sparse_strips: Use store_slice wherever possible#1616
Conversation
|
I believe this is the PR: linebender/fearless_simd#181? |
grebmeg
left a comment
There was a problem hiding this comment.
I haven’t dug into store_slice much, but could you elaborate on why it might be more efficient on x86? Also, if NEON performance looks good, could we benchmark it on other platforms as well? I’m a bit hesitant to approve this without stronger evidence or proof.
|
Fair enough! The reason why we determined it to be slower is that |
|
@grebmeg Here are my results from running in Chrome using WASM, no changes observed:
|
|
Hmm, so to be honest, I wasn't really able to measure a speed boost on AVX2. Somtimes it's a bit faster, sometimes a bit slower, but it mostly seems like noise (see below). Anyway, since there at least don't seem to be any regressions, I would personally still be in favor of merging this, since fearless_simd provides an API for storing vectors now, it's better to use it than just hope that |
grebmeg
left a comment
There was a problem hiding this comment.
Thanks for the measurements, @LaurenzV! it's good to see there's no perf regression. I think I roughly see the minor benefits here, but I'm still a bit skeptical. That said, I likely just don't have clear picture for this yet, so happy to defer to your intuition here.


A while ago, we added the
store_slicemethod to fearless_simd since we realized that usingcopy_from_slicedoesn't always turn into efficient code on x86. Therefore, using this code in vello should lead to better performance, at the very least not regress it. I measured this on NEON and I'm not seeing any regressions.