KISS access to intrinsics by Shnatsel · Pull Request #212 · linebender/fearless_simd

Shnatsel · 2026-05-16T16:36:19Z

Alternative to #108 and #196

The idea is to give the user access to a context with the appropriate #[target_feature] annotations based on a runtime token, and let them make use of the existing safe intrinsics in std without us having to wrap every single intrinsic.

The updated examples/srgb.rs shows this API in action.

The main draw is simplicity: the diff is 10x smaller than the alternatives, and half of that is generated code. This way we don't need to maintain another complex, safety-critical code generator that emits unsafe, unlike the archmage crate or #196.

After this is merged, everything that is gated on the safe_wrappers feature can be deleted.

Future work: better documentation, this feature is poorly documented both before and after this PR

… TokenStream

…es. We can't use #[inline(always)] with target features so this is the next best thing. Remove #[inline] from the outer function to let the caller control inlining.

…s, preventing a soundness hole

…e; they cannot be invoked in other contexts anyway because the required type tokens are conditionally compiled only on platforms where they are available

DJMcNab

As this can provide a practical path to dropping the safe_wrappers code, I'm all for it.

I still slightly prefer the design of #108, even for use with the current choice of Levels. That's due to it being more expression oriented. But that's by-the-by presently, as no-one is motivated to review #108

DJMcNab · 2026-05-17T16:03:54Z

+macro_rules! neon_kernel {
+    (
+        $(#[$meta:meta])*
+        $vis:vis fn $name:ident(


My only holdup with this PR is that you aren't required to state the token argument "upfront".
That is, I'd like there to be a $token: ident: $token_ty: Ty here, where we validate that $token_ty is Neon (fwiw, we should still use $crate:: when defining the actual outer function, to make it easier to reason about).

I think that not having that makes it much harder to understand the functions defined by calls to this macro.

Do I understand correctly that instead of

#[cfg(target_arch = "aarch64")] fearless_simd::neon_kernel! { fn copy_alpha_neon(a: float32x4_t, b: float32x4_t) -> float32x4_t { vcopyq_laneq_f32::<3, 3>(a, b) } }

you would like to see

#[cfg(target_arch = "aarch64")] fearless_simd::neon_kernel! { fn copy_alpha_neon(_token: fearless_simd::Neon, a: float32x4_t, b: float32x4_t) -> float32x4_t { vcopyq_laneq_f32::<3, 3>(a, b) } }

Is that right?

There are two reasons I don't want to expose the token to the inner function.

One is the boilerplate. With all the #[inline(always)] fn foo<S: Simd>(simd: S, ... boilerplate going around we already have kind of too much of it. I agree turning that into #[simd] fn foo (... is too magical, but the boilerplate really adds up, and between all the #[cfg]s and the macros I'll take a less magical boilerplate reduction if I can.

The other reason is more subtle. It would be a bad idea to actually use the token inside the function, because the caller certifies that the code is running with SSE 4.2 or higher, while all the methods on the token only have access to features from SSE 4.2 and not the actual CPU we're running on. If we want to allow using high-level methods inside SIMD kernels, we need to allow them to be generic over SIMD and do the whole #[inline(always)] fn foo<S: Simd>(simd: S, ... dance again, which I actually have implemented in a branch but didn't want to include in this PR due to the added complexity.

And circling back to the boilerplate topic, I don't want to that to turn into #[inline(always)] fn foo<S: Simd>(_token: fearless_simd::Sse4_2, simd: S, ... because my eyes are already glazing over and I didn't even get to the actual function arguments yet.

Two things:

I hadn't considered that it would look like the token is accessible to the inner function. I agree that this makes it less obvious what the path forward was. I never wanted it to be actually available to the inner function. In that case, perhaps fn_name($token_ty: ty, would be a better syntax.

my argument was entirely that if I read:

neon_function! { fn some_function(arg1: A, arg2: B); }

I expect it to be called as:
some_function(arg1, arg2).

However, your proposal instead will lead to someone attempting that to get a cryptic error message about the type mismatching and there being the wrong number of arguments.

I hadn't considered that it would look like the token is accessible to the inner function. I agree that this makes it less obvious what the path forward was.

I've experimented some more and I think this is actually fine.

In the srgb example, the explicit SSE4.2 path gets inlined into an AVX2-enabled generic context if AVX2 is actually availble on the system, and we get AVX2 vblendps emitted rather than SSE4.2 blendps. So you still get instruction encoding and register space benefits from the higher level if it's available most of the time.

And the token is actually useful inside the function, e.g. for safe loads/stores from slices: token.load_array_f32x4(input).into() is a safe load into the native SSE4.2 vector type.

DJMcNab · 2026-05-17T16:05:22Z

+            $($kernel_body:tt)*
+        }
+    ) => {
+        #[cfg(target_arch = "aarch64")]


We should comment that this is here for docs when doc_auto_cfg is enabled.

DJMcNab · 2026-05-17T16:07:16Z

+#[doc = "}"]
+#[doc = "```"]
+#[macro_export]
+macro_rules! wasm_simd128_kernel {


I'd lean slightly towards just deleting this entirely, but not strongly enough to block this pr on it.

I went back and forth on this, but I think it's nice to provide a uniform API surface.

I already carved out a separate doc string in the generator just for this macro, I'm happy to add an edit of your choosing to it. We can totally highlight other ways to do it there.

I mean, we should remove the entire rest of the docs and just say "this macro is the identity macro, and only exists to be uniform with the other macros".

Obviously in the scenario where we remove this macro, we'd mention and explain it not existing in the doc comment.
But it's also the case that anyone who would be using this macro is already aware that they need to compile two binaries.

Having thought about this more makes me even more in favour of removing it!

Well, NEON is purely compile-time-gated as well. Aarch64 includes NEON in the baseline instruction set, and LLVM doesn't support #[target_feature(disable = "foo")]. 32-bit ARM doesn't really have a way to detect target features at runtime (the std macro exists bu it's nightly-only and incomplete with no path to stabilization) and in practice anything targeting 32-bit ARM just enables NEON at compile time anyway.

Does that mean we should remove the NEON macro too and teach users two entirely different patterns for different architectures? Or even for the same architecture, if we add another ARM level later?

This being the case about neon is entirely news to me...

I feel like that fact should have had some wider implications to this library, but I've not thought about it fully.

I've double-checked and I was wrong about this. While NEON is an always-enabled feature on Aarch64, you still need to wrap intrinsic calls into #[target_feature(enable = "neon")], so this macro is still necessary.

DJMcNab · 2026-05-17T16:11:59Z

What do you think about not even auto-generating these, and just having them written manually?

OTOH, I guess I'd probably still want a test which double checks the impls were sound, so it's probably fine as is.
I think that this is reasonable as is.

I didn't like the idea of unsafe macro code copy-pasted four times (and possibly more later on), and my experiments with having a single kernel! macro that accepts a Level were somewhat awkward to use.

I'm not thrilled about the current implementation but it's the least awkward option that I can see.

Shnatsel · 2026-05-18T01:14:11Z

I've implemented the API shape you suggested in #214 and while I'm not a fan of the API, the simplicity of it is hard to pass up. I'm closing this in favor of #214.

@DJMcNab

An even simpler implementation of the concept from linebender#212, thanks to the review feedback by @DJMcNab Alternative to linebender#108 and linebender#196 Part of linebender#166

Shnatsel added 5 commits May 16, 2026 15:57

Add target-feature kernel macros

b1bc6d1

Add a comment explaning the rationale for emitting String rather than…

965d36c

… TokenStream

Add #[inline] annotation to the generated function with target featur…

497e943

…es. We can't use #[inline(always)] with target features so this is the next best thing. Remove #[inline] from the outer function to let the caller control inlining.

Rewrite the explicit NEON intrinsic usage example to the new API

3b1f264

Expand the srgb example to also cover x86

76d6636

Shnatsel requested a review from LaurenzV May 16, 2026 16:36

Shnatsel added 8 commits May 16, 2026 17:50

Cover all 4 kernel macros with tests

0748bdf

Add a test verifying that unsafe fn is rejected by the kernel macro…

5912818

…s, preventing a soundness hole

Expand documentation on kernel macros

937feea

Turn tests into doctests, so that documentation shows usage examples

f63ada6

doc: explain why kernel macros are needed

d7ed4c3

Polish the documentation some more

57d9f6e

Only export the kernel macros on the platforms where they're availabl…

2025e91

…e; they cannot be invoked in other contexts anyway because the required type tokens are conditionally compiled only on platforms where they are available

Link to the token docs from macro definitions

94791d7

DJMcNab approved these changes May 17, 2026

View reviewed changes

Shnatsel mentioned this pull request May 18, 2026

Even KISS-er access to intrinsics #214

Merged

Shnatsel closed this May 18, 2026

Shnatsel mentioned this pull request May 19, 2026

Rip out core::arch wrappers #216

Merged

Conversation

Shnatsel commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DJMcNab left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Shnatsel May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DJMcNab May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Shnatsel commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Shnatsel commented May 16, 2026 •

edited

Loading

DJMcNab left a comment •

edited

Loading

Shnatsel May 17, 2026 •

edited

Loading

DJMcNab May 18, 2026 •

edited

Loading