Skip to content

Fix nightly GCC8 build regressions in bit_cast and DeviceReduce#8924

Open
alliepiper wants to merge 7 commits into
NVIDIA:mainfrom
alliepiper:nightly-triage
Open

Fix nightly GCC8 build regressions in bit_cast and DeviceReduce#8924
alliepiper wants to merge 7 commits into
NVIDIA:mainfrom
alliepiper:nightly-triage

Conversation

@alliepiper
Copy link
Copy Markdown
Contributor

Summary

Two unrelated regressions surfaced together in the 2026-05-12 nightly run on main. Both block all GCC builds on older CTKs (12.0, 12.X) for libcu++, and all GCC 8 CUB builds.

libcu++: bit_cast<__float128> static_assert fires

__bit_cast_memcpy is reached on GCC 8/9/10 because those host compilers don't provide __builtin_bit_cast. Commit 8b28e1d (#8265) replaced is_trivially_default_constructible_v with the C++20-style default_initializable concept; the concept emulation evaluates false for __float128 in C++17 mode, even though the built-in scalar is default-initializable.

Restore an equivalent type trait (is_default_constructible_v), which queries the compiler builtin directly and works on every supported GCC. The _CCCL_COMPILER(GCC, <=, 7) gate is no longer needed and is removed.

CUB: DeviceReduce::reduce_impl flags reduction_op as unused

Commit b60f063 (#8851) merged the three determinism overloads into a single if constexpr template. The __gpu_to_gpu branch dispatches to RFA, which hardcodes deterministic_sum_t<accum_t> internally and does not accept a reduction operator. GCC 8's -Werror=unused-but-set-parameter flags reduction_op because that branch never references it.

The parameter must remain in the signature (the other two branches use it). The invariant — enforced upstream by __transform_reduce's float_double_plus check — is acknowledged with (void) reduction_op plus a one-line note inside the branch.

CI override

ci/matrix.yaml carries a temporary workflows.override reproducing the failures on PR CI:

  • libcu++ build on GCC 8/9/10 × CTK 12.0/12.X × all stds
  • CUB build on GCC 8 × CTK 12.0/12.X × C++17

Reset to empty before merging.

Test plan

  • Override-matrix PR CI passes on this branch
  • Reset workflows.override to empty
  • Full PR CI passes after override reset

Two unrelated regressions surfaced together in the 2026-05-12 nightly run on
`main`. Both block all GCC builds on older CTKs (12.0, 12.X) for libcu++, and
all GCC 8 CUB builds.

libcu++: bit_cast<__float128>
  The memcpy fallback path in `__bit_cast_memcpy` is reached on GCC 8/9/10
  because those host compilers don't provide `__builtin_bit_cast`. Commit
  8b28e1d (NVIDIA#8265) replaced the previous `is_trivially_default_constructible_v`
  trait with the C++20-style `default_initializable` concept; the concept's
  emulation evaluates false for `__float128` in C++17 mode, even though the
  built-in scalar is in fact default-initializable. Restore an equivalent type
  trait (`is_default_constructible_v`), which queries the compiler builtin
  directly and works on every supported GCC. The GCC<=7 gate around the
  static_assert is no longer needed and is removed.

CUB: DeviceReduce reduction_op
  Commit b60f063 (NVIDIA#8851) merged the three determinism overloads of
  `reduce_impl` into a single `if constexpr` template. The `__gpu_to_gpu`
  branch dispatches to RFA, which hardcodes `deterministic_sum_t<accum_t>`
  internally and does not accept a reduction operator. GCC 8's
  `-Werror=unused-but-set-parameter` flags `reduction_op` because that branch
  never references it. The parameter must remain in the signature (the other
  two branches use it), so the invariant -- enforced upstream by
  `__transform_reduce`'s `float_double_plus` check -- is acknowledged with
  `(void) reduction_op` plus a one-line note inside the branch.

[skip-vdc][skip-docs][skip-tpt]
@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot Bot commented May 12, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Progress in CCCL May 12, 2026
@alliepiper
Copy link
Copy Markdown
Contributor Author

/ok to test 317bcf5

@github-project-automation github-project-automation Bot moved this from In Progress to In Review in CCCL May 12, 2026
Copy link
Copy Markdown
Contributor

@fbusato fbusato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer #if !_CCCL_COMPILER(GCC, <=, 8). cuda::std::default_initializable is more precise than is_default_constructible

@github-project-automation github-project-automation Bot moved this from In Review to In Progress in CCCL May 12, 2026
@github-actions

This comment has been minimized.

`${#ARRAY}` is syntactic sugar for `${#ARRAY[0]}` -- the length of element 0.
Under `set -u`, when an array is empty (no `--ctest-targets`, `--lit-tests`,
etc. on the command line), this errors with "unbound variable" before the
guard can short-circuit. Use `${#ARRAY[@]}` (total element count), which is
defined as 0 on an empty array.
GCC 8 infers noexcept on the generator constructor more permissively
than GCC 9+, so `static_assert(!noexcept(Mask(is_even{})))` fires.
Widen the existing GCC 7 skip to also cover GCC 8.
…luated is faked

On GCC 8 there is no __builtin_is_constant_evaluated, so
cuda::std::is_constant_evaluated() always returns false. That makes
`if (!is_constant_evaluated())` look statically true, so the compiler
takes the runtime branch even in a constant expression and trips over
the pointer compare, which is not allowed there.

Only enable the runtime branch on compilers that actually provide the
builtin; everyone else falls through to the constexpr-safe path.
@alliepiper
Copy link
Copy Markdown
Contributor Author

/ok to test d69de21

@github-actions

This comment has been minimized.

@miscco
Copy link
Copy Markdown
Contributor

miscco commented May 13, 2026

/ok to test b40e516

@github-actions

This comment has been minimized.

@alliepiper alliepiper requested a review from fbusato May 13, 2026 17:36
@alliepiper alliepiper marked this pull request as ready for review May 13, 2026 17:37
@alliepiper alliepiper requested review from a team as code owners May 13, 2026 17:37
@alliepiper alliepiper requested a review from jrhemstad May 13, 2026 17:37
@cccl-authenticator-app cccl-authenticator-app Bot moved this from In Progress to In Review in CCCL May 13, 2026
Comment thread libcudacxx/include/cuda/std/__bit/bit_cast.h Outdated
@github-project-automation github-project-automation Bot moved this from In Review to In Progress in CCCL May 13, 2026
@github-actions

This comment has been minimized.

Co-authored-by: Federico Busato <50413820+fbusato@users.noreply.github.com>
@alliepiper alliepiper requested a review from fbusato May 14, 2026 17:20
@github-project-automation github-project-automation Bot moved this from In Progress to In Review in CCCL May 14, 2026
@github-actions
Copy link
Copy Markdown
Contributor

😬 CI Workflow Results

🟥 Finished in 1h 29m: Pass: 15%/500 | Total: 3d 15h | Max: 1h 19m | Hits: 28%/131965

See results here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

3 participants