Skip to content

Remove a register#3

Merged
lemire merged 2 commits intomainfrom
remove_a_register
Feb 16, 2026
Merged

Remove a register#3
lemire merged 2 commits intomainfrom
remove_a_register

Conversation

@lemire
Copy link
Copy Markdown
Member

@lemire lemire commented Feb 13, 2026

We can simplify the algorithm. This should not affect the performance, although it might be slightly beneficial because we need one fewer register.

The gist of it is that we replace

_mm512_madd52lo_epu64(zmmzero, bcstq_h, ifma_const);

by

_mm512_madd52lo_epu64(ifma_const, bcstq_h, ifma_const);

So instead of loading zmmzero and ifma_const, we just need ifma_const.

I am also trimming out shift_and_insert_dot as it is not needed.

@lemire lemire requested a review from jaja360 February 13, 2026 19:46
Copy link
Copy Markdown
Collaborator

@jaja360 jaja360 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool !!!

I ran some benchmarks to see if it impacts performance. It doesn't at all with clang. With g++, however, I get that the new version is 7% faster on the Twitter dataset, but 15% slower on Uniform-8 (it goes from 4.01 to 4.62 c/n). g++ has always given us suboptimal results (it generates 30 instructions instead of 27 for clang, on Uniform-8), so it seems more like a GCC codegen quirk than a problem on our side.

I think we can merge !

@lemire
Copy link
Copy Markdown
Member Author

lemire commented Feb 16, 2026

@jaja360 Logically, it should not make a difference... We save one named register, but that's not a limitation. We also save loading the register, which might help a little bit (maybe save one instruction) but it was almost surely not a bottleneck.

Of course, the main point is to simplify the algorithm and the code.

Merging.

@lemire lemire merged commit 14f15ea into main Feb 16, 2026
4 checks passed
@jaja360
Copy link
Copy Markdown
Collaborator

jaja360 commented Feb 16, 2026

Indeed, it should not do a difference (and it does not on clang !)
I think g++ just have some code alignment issues (that we already observed earlier), this time due to the static const variables we removed.

@jaja360 jaja360 deleted the remove_a_register branch February 16, 2026 01:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants