Fix nightly GCC8 build regressions in bit_cast and DeviceReduce by alliepiper · Pull Request #8924 · NVIDIA/cccl

alliepiper · 2026-05-12T14:32:28Z

Summary

Two unrelated regressions surfaced together in the 2026-05-12 nightly run on main. Both block all GCC builds on older CTKs (12.0, 12.X) for libcu++, and all GCC 8 CUB builds.

libcu++: `bit_cast<__float128>` static_assert fires

__bit_cast_memcpy is reached on GCC 8/9/10 because those host compilers don't provide __builtin_bit_cast. Commit 8b28e1d (#8265) replaced is_trivially_default_constructible_v with the C++20-style default_initializable concept; the concept emulation evaluates false for __float128 in C++17 mode, even though the built-in scalar is default-initializable.

Restore an equivalent type trait (is_default_constructible_v), which queries the compiler builtin directly and works on every supported GCC. The _CCCL_COMPILER(GCC, <=, 7) gate is no longer needed and is removed.

CUB: `DeviceReduce::reduce_impl` flags `reduction_op` as unused

Commit b60f063 (#8851) merged the three determinism overloads into a single if constexpr template. The __gpu_to_gpu branch dispatches to RFA, which hardcodes deterministic_sum_t<accum_t> internally and does not accept a reduction operator. GCC 8's -Werror=unused-but-set-parameter flags reduction_op because that branch never references it.

The parameter must remain in the signature (the other two branches use it). The invariant — enforced upstream by __transform_reduce's float_double_plus check — is acknowledged with (void) reduction_op plus a one-line note inside the branch.

CI override

ci/matrix.yaml carries a temporary workflows.override reproducing the failures on PR CI:

libcu++ build on GCC 8/9/10 × CTK 12.0/12.X × all stds
CUB build on GCC 8 × CTK 12.0/12.X × C++17

Reset to empty before merging.

Test plan

Override-matrix PR CI passes on this branch
Reset workflows.override to empty
Full PR CI passes after override reset

Two unrelated regressions surfaced together in the 2026-05-12 nightly run on `main`. Both block all GCC builds on older CTKs (12.0, 12.X) for libcu++, and all GCC 8 CUB builds. libcu++: bit_cast<__float128> The memcpy fallback path in `__bit_cast_memcpy` is reached on GCC 8/9/10 because those host compilers don't provide `__builtin_bit_cast`. Commit 8b28e1d (NVIDIA#8265) replaced the previous `is_trivially_default_constructible_v` trait with the C++20-style `default_initializable` concept; the concept's emulation evaluates false for `__float128` in C++17 mode, even though the built-in scalar is in fact default-initializable. Restore an equivalent type trait (`is_default_constructible_v`), which queries the compiler builtin directly and works on every supported GCC. The GCC<=7 gate around the static_assert is no longer needed and is removed. CUB: DeviceReduce reduction_op Commit b60f063 (NVIDIA#8851) merged the three determinism overloads of `reduce_impl` into a single `if constexpr` template. The `__gpu_to_gpu` branch dispatches to RFA, which hardcodes `deterministic_sum_t<accum_t>` internally and does not accept a reduction operator. GCC 8's `-Werror=unused-but-set-parameter` flags `reduction_op` because that branch never references it. The parameter must remain in the signature (the other two branches use it), so the invariant -- enforced upstream by `__transform_reduce`'s `float_double_plus` check -- is acknowledged with `(void) reduction_op` plus a one-line note inside the branch. [skip-vdc][skip-docs][skip-tpt]

copy-pr-bot · 2026-05-12T14:32:44Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

alliepiper · 2026-05-12T14:33:03Z

/ok to test 317bcf5

fbusato

I would prefer #if !_CCCL_COMPILER(GCC, <=, 8). cuda::std::default_initializable is more precise than is_default_constructible

`${#ARRAY}` is syntactic sugar for `${#ARRAY[0]}` -- the length of element 0. Under `set -u`, when an array is empty (no `--ctest-targets`, `--lit-tests`, etc. on the command line), this errors with "unbound variable" before the guard can short-circuit. Use `${#ARRAY[@]}` (total element count), which is defined as 0 on an empty array.

GCC 8 infers noexcept on the generator constructor more permissively than GCC 9+, so `static_assert(!noexcept(Mask(is_even{})))` fires. Widen the existing GCC 7 skip to also cover GCC 8.

…luated is faked On GCC 8 there is no __builtin_is_constant_evaluated, so cuda::std::is_constant_evaluated() always returns false. That makes `if (!is_constant_evaluated())` look statically true, so the compiler takes the runtime branch even in a constant expression and trips over the pointer compare, which is not allowed there. Only enable the runtime branch on compilers that actually provide the builtin; everyone else falls through to the constexpr-safe path.

alliepiper · 2026-05-12T22:20:14Z

/ok to test d69de21

miscco · 2026-05-13T07:05:32Z

/ok to test b40e516

Co-authored-by: Federico Busato <50413820+fbusato@users.noreply.github.com>

github-actions · 2026-05-14T18:51:38Z

😬 CI Workflow Results

🟥 Finished in 1h 29m: Pass: 15%/500 | Total: 3d 15h | Max: 1h 19m | Hits: 28%/131965

See results here.

github-project-automation Bot added this to CCCL May 12, 2026

github-project-automation Bot moved this to Todo in CCCL May 12, 2026

cccl-authenticator-app Bot moved this from Todo to In Progress in CCCL May 12, 2026

miscco approved these changes May 12, 2026

View reviewed changes

github-project-automation Bot moved this from In Progress to In Review in CCCL May 12, 2026

fbusato requested changes May 12, 2026

View reviewed changes

github-project-automation Bot moved this from In Review to In Progress in CCCL May 12, 2026

This comment has been minimized.

Sign in to view

alliepiper added 3 commits May 12, 2026 17:18

Skip simd::basic_mask generator-ctor noexcept assert on GCC 8

259413d

GCC 8 infers noexcept on the generator constructor more permissively than GCC 9+, so `static_assert(!noexcept(Mask(is_even{})))` fires. Widen the existing GCC 7 skip to also cover GCC 8.

This comment has been minimized.

Sign in to view

Fix copy runtime optimization

b40e516

This comment has been minimized.

Sign in to view

clear override

49749fe

alliepiper requested a review from fbusato May 13, 2026 17:36

alliepiper marked this pull request as ready for review May 13, 2026 17:37

alliepiper requested review from a team as code owners May 13, 2026 17:37

alliepiper requested a review from jrhemstad May 13, 2026 17:37

cccl-authenticator-app Bot moved this from In Progress to In Review in CCCL May 13, 2026

fbusato requested changes May 13, 2026

View reviewed changes

Comment thread libcudacxx/include/cuda/std/__bit/bit_cast.h Outdated

github-project-automation Bot moved this from In Review to In Progress in CCCL May 13, 2026

This comment has been minimized.

Sign in to view

Update libcudacxx/include/cuda/std/__bit/bit_cast.h

8463734

Co-authored-by: Federico Busato <50413820+fbusato@users.noreply.github.com>

alliepiper requested a review from fbusato May 14, 2026 17:20

fbusato approved these changes May 14, 2026

View reviewed changes

github-project-automation Bot moved this from In Progress to In Review in CCCL May 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix nightly GCC8 build regressions in bit_cast and DeviceReduce#8924

Fix nightly GCC8 build regressions in bit_cast and DeviceReduce#8924
alliepiper wants to merge 7 commits into
NVIDIA:mainfrom
alliepiper:nightly-triage

alliepiper commented May 12, 2026

Uh oh!

copy-pr-bot Bot commented May 12, 2026

Uh oh!

alliepiper commented May 12, 2026

Uh oh!

fbusato left a comment

Uh oh!

This comment has been minimized.

alliepiper commented May 12, 2026

Uh oh!

This comment has been minimized.

miscco commented May 13, 2026

Uh oh!

This comment has been minimized.

Uh oh!

This comment has been minimized.

github-actions Bot commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

alliepiper commented May 12, 2026

Summary

libcu++: bit_cast<__float128> static_assert fires

CUB: DeviceReduce::reduce_impl flags reduction_op as unused

CI override

Test plan

Uh oh!

copy-pr-bot Bot commented May 12, 2026

Uh oh!

alliepiper commented May 12, 2026

Uh oh!

fbusato left a comment

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

alliepiper commented May 12, 2026

Uh oh!

This comment has been minimized.

miscco commented May 13, 2026

Uh oh!

This comment has been minimized.

Uh oh!

This comment has been minimized.

github-actions Bot commented May 14, 2026

😬 CI Workflow Results

🟥 Finished in 1h 29m: Pass: 15%/500 | Total: 3d 15h | Max: 1h 19m | Hits: 28%/131965

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

libcu++: `bit_cast<__float128>` static_assert fires

CUB: `DeviceReduce::reduce_impl` flags `reduction_op` as unused