KISS access to intrinsics#212
Conversation
…es. We can't use #[inline(always)] with target features so this is the next best thing. Remove #[inline] from the outer function to let the caller control inlining.
…s, preventing a soundness hole
…e; they cannot be invoked in other contexts anyway because the required type tokens are conditionally compiled only on platforms where they are available
| macro_rules! neon_kernel { | ||
| ( | ||
| $(#[$meta:meta])* | ||
| $vis:vis fn $name:ident( |
There was a problem hiding this comment.
My only holdup with this PR is that you aren't required to state the token argument "upfront".
That is, I'd like there to be a $token: ident: $token_ty: Ty here, where we validate that $token_ty is Neon (fwiw, we should still use $crate:: when defining the actual outer function, to make it easier to reason about).
I think that not having that makes it much harder to understand the functions defined by calls to this macro.
There was a problem hiding this comment.
Do I understand correctly that instead of
#[cfg(target_arch = "aarch64")]
fearless_simd::neon_kernel! {
fn copy_alpha_neon(a: float32x4_t, b: float32x4_t) -> float32x4_t {
vcopyq_laneq_f32::<3, 3>(a, b)
}
}you would like to see
#[cfg(target_arch = "aarch64")]
fearless_simd::neon_kernel! {
fn copy_alpha_neon(_token: fearless_simd::Neon, a: float32x4_t, b: float32x4_t) -> float32x4_t {
vcopyq_laneq_f32::<3, 3>(a, b)
}
}Is that right?
There was a problem hiding this comment.
There are two reasons I don't want to expose the token to the inner function.
One is the boilerplate. With all the #[inline(always)] fn foo<S: Simd>(simd: S, ... boilerplate going around we already have kind of too much of it. I agree turning that into #[simd] fn foo (... is too magical, but the boilerplate really adds up, and between all the #[cfg]s and the macros I'll take a less magical boilerplate reduction if I can.
The other reason is more subtle. It would be a bad idea to actually use the token inside the function, because the caller certifies that the code is running with SSE 4.2 or higher, while all the methods on the token only have access to features from SSE 4.2 and not the actual CPU we're running on. If we want to allow using high-level methods inside SIMD kernels, we need to allow them to be generic over SIMD and do the whole #[inline(always)] fn foo<S: Simd>(simd: S, ... dance again, which I actually have implemented in a branch but didn't want to include in this PR due to the added complexity.
And circling back to the boilerplate topic, I don't want to that to turn into #[inline(always)] fn foo<S: Simd>(_token: fearless_simd::Sse4_2, simd: S, ... because my eyes are already glazing over and I didn't even get to the actual function arguments yet.
There was a problem hiding this comment.
Two things:
- I hadn't considered that it would look like the token is accessible to the inner function. I agree that this makes it less obvious what the path forward was. I never wanted it to be actually available to the inner function. In that case, perhaps
fn_name($token_ty: ty,would be a better syntax. - my argument was entirely that if I read:
neon_function! {
fn some_function(arg1: A, arg2: B);
}I expect it to be called as:
some_function(arg1, arg2).
However, your proposal instead will lead to someone attempting that to get a cryptic error message about the type mismatching and there being the wrong number of arguments.
There was a problem hiding this comment.
I hadn't considered that it would look like the token is accessible to the inner function. I agree that this makes it less obvious what the path forward was.
I've experimented some more and I think this is actually fine.
In the srgb example, the explicit SSE4.2 path gets inlined into an AVX2-enabled generic context if AVX2 is actually availble on the system, and we get AVX2 vblendps emitted rather than SSE4.2 blendps. So you still get instruction encoding and register space benefits from the higher level if it's available most of the time.
And the token is actually useful inside the function, e.g. for safe loads/stores from slices: token.load_array_f32x4(input).into() is a safe load into the native SSE4.2 vector type.
| $($kernel_body:tt)* | ||
| } | ||
| ) => { | ||
| #[cfg(target_arch = "aarch64")] |
There was a problem hiding this comment.
We should comment that this is here for docs when doc_auto_cfg is enabled.
| #[doc = "}"] | ||
| #[doc = "```"] | ||
| #[macro_export] | ||
| macro_rules! wasm_simd128_kernel { |
There was a problem hiding this comment.
I'd lean slightly towards just deleting this entirely, but not strongly enough to block this pr on it.
There was a problem hiding this comment.
I went back and forth on this, but I think it's nice to provide a uniform API surface.
I already carved out a separate doc string in the generator just for this macro, I'm happy to add an edit of your choosing to it. We can totally highlight other ways to do it there.
There was a problem hiding this comment.
I mean, we should remove the entire rest of the docs and just say "this macro is the identity macro, and only exists to be uniform with the other macros".
Obviously in the scenario where we remove this macro, we'd mention and explain it not existing in the doc comment.
But it's also the case that anyone who would be using this macro is already aware that they need to compile two binaries.
Having thought about this more makes me even more in favour of removing it!
There was a problem hiding this comment.
Well, NEON is purely compile-time-gated as well. Aarch64 includes NEON in the baseline instruction set, and LLVM doesn't support #[target_feature(disable = "foo")]. 32-bit ARM doesn't really have a way to detect target features at runtime (the std macro exists bu it's nightly-only and incomplete with no path to stabilization) and in practice anything targeting 32-bit ARM just enables NEON at compile time anyway.
Does that mean we should remove the NEON macro too and teach users two entirely different patterns for different architectures? Or even for the same architecture, if we add another ARM level later?
There was a problem hiding this comment.
This being the case about neon is entirely news to me...
I feel like that fact should have had some wider implications to this library, but I've not thought about it fully.
There was a problem hiding this comment.
I've double-checked and I was wrong about this. While NEON is an always-enabled feature on Aarch64, you still need to wrap intrinsic calls into #[target_feature(enable = "neon")], so this macro is still necessary.
There was a problem hiding this comment.
What do you think about not even auto-generating these, and just having them written manually?
OTOH, I guess I'd probably still want a test which double checks the impls were sound, so it's probably fine as is.
I think that this is reasonable as is.
There was a problem hiding this comment.
I didn't like the idea of unsafe macro code copy-pasted four times (and possibly more later on), and my experiments with having a single kernel! macro that accepts a Level were somewhat awkward to use.
I'm not thrilled about the current implementation but it's the least awkward option that I can see.
An even simpler implementation of the concept from linebender#212, thanks to the review feedback by @DJMcNab Alternative to linebender#108 and linebender#196 Part of linebender#166
Alternative to #108 and #196
The idea is to give the user access to a context with the appropriate
#[target_feature]annotations based on a runtime token, and let them make use of the existing safe intrinsics instdwithout us having to wrap every single intrinsic.The updated
examples/srgb.rsshows this API in action.The main draw is simplicity: the diff is 10x smaller than the alternatives, and half of that is generated code. This way we don't need to maintain another complex, safety-critical code generator that emits
unsafe, unlike thearchmagecrate or #196.After this is merged, everything that is gated on the
safe_wrappersfeature can be deleted.Future work: better documentation, this feature is poorly documented both before and after this PR