`cuda::std::simd` Optimize Min/Max by fbusato · Pull Request #8949 · NVIDIA/cccl

fbusato · 2026-05-12T21:50:24Z

Description

This PR introduces the following optimizations for SIMD min/max over two vectors:

VIMNMX for packed signed/unsigned 16-bit data: SM90+
VIMNMX for packed signed/unsigned 8-bit data: SM120f

copy-pr-bot · 2026-05-12T21:50:28Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

miscco · 2026-05-13T14:37:07Z

+#define _CCCL_HAS_SIMD_F32X2()            (_CCCL_HAS_SIMD_F32X2_INTRINSICS() || _CCCL_HAS_SIMD_F32X2_PTX())

-#define _CCCL_HAS_SIMD_F32X2() (_CCCL_HAS_SIMD_F32X2_INTRINSICS() || _CCCL_HAS_SIMD_F32X2_PTX())
+#define _CCCL_HAS_SIMD_8BIT_INTRINSICS() 0 // TODO(fbusato): CTK 13.2 produces non-optimal code for 8-bit SIMD instrs.


Can you please check whether newer compiler generate better code and create an nvbug for the compiler team?

Now that you pointed out I figured out the situation is even more complex.

CTK < 12.8: no optimization

CTK >= 12.8: 16bit x2 case are partially optimized. We see two VIMNMX.U16x2 instructions + PRMT

ToT CTK/nvcc: no optimization for 8bit x4

Manual optimization: works as expected

see https://godbolt.org/z/5j5c3sv3Y

added bug number in the code

miscco · 2026-05-13T14:42:52Z

+_CCCL_TEMPLATE(typename _Tp, typename _Abi, typename _Vec = basic_vec<_Tp, _Abi>)
+_CCCL_REQUIRES(totally_ordered<_Tp>)
+[[nodiscard]]
+_CCCL_API constexpr _Vec min(const basic_vec<_Tp, _Abi>& __lhs, const basic_vec<_Tp, _Abi>& __rhs) noexcept


Important: This will break in tile mode, I believe we need to mark all SIMD optimizations as _CCCL_HOST_DEVICE or disable them with !_CCCL_TILE_COMPILATION()

I have to update several PRs for Tile compatibility...

github-actions · 2026-05-14T00:18:57Z

🥳 CI Workflow Results

🟩 Finished in 1h 48m: Pass: 100%/113 | Total: 1d 17h | Max: 54m 01s | Hits: 99%/327429

See results here.

fbusato added 3 commits May 8, 2026 16:02

draft

1f98fdf

draft

3337b35

complete min/max optimization

03c8876

fbusato self-assigned this May 12, 2026

fbusato added this to CCCL May 12, 2026

fbusato added the libcu++ For all items related to libcu++ label May 12, 2026

github-project-automation Bot moved this to Todo in CCCL May 12, 2026

cccl-authenticator-app Bot moved this from Todo to In Progress in CCCL May 12, 2026

add SASS codegen tests

d76ea70

fbusato moved this from In Progress to In Review in CCCL May 12, 2026

fbusato marked this pull request as ready for review May 12, 2026 22:53

fbusato requested review from a team as code owners May 12, 2026 22:53

fbusato requested a review from bernhardmgruber May 12, 2026 22:53

This comment has been minimized.

Sign in to view

miscco reviewed May 13, 2026

View reviewed changes

add bug number

576382e

disable cuTile for asm functions

b16fa15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`cuda::std::simd` Optimize Min/Max#8949

`cuda::std::simd` Optimize Min/Max#8949
fbusato wants to merge 6 commits into
NVIDIA:mainfrom
fbusato:simd-optimize-min-max

fbusato commented May 12, 2026

Uh oh!

copy-pr-bot Bot commented May 12, 2026

Uh oh!

This comment has been minimized.

miscco May 13, 2026

Uh oh!

fbusato May 13, 2026

Uh oh!

fbusato May 13, 2026

Uh oh!

miscco May 13, 2026

Uh oh!

fbusato May 13, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fbusato commented May 12, 2026

Description

Uh oh!

copy-pr-bot Bot commented May 12, 2026

Uh oh!

This comment has been minimized.

miscco May 13, 2026

Choose a reason for hiding this comment

Uh oh!

fbusato May 13, 2026

Choose a reason for hiding this comment

Uh oh!

fbusato May 13, 2026

Choose a reason for hiding this comment

Uh oh!

miscco May 13, 2026

Choose a reason for hiding this comment

Uh oh!

fbusato May 13, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 14, 2026

🥳 CI Workflow Results

🟩 Finished in 1h 48m: Pass: 100%/113 | Total: 1d 17h | Max: 54m 01s | Hits: 99%/327429

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants