Fix nightly GCC8 build regressions in bit_cast and DeviceReduce#8924
Open
alliepiper wants to merge 7 commits into
Open
Fix nightly GCC8 build regressions in bit_cast and DeviceReduce#8924alliepiper wants to merge 7 commits into
alliepiper wants to merge 7 commits into
Conversation
Two unrelated regressions surfaced together in the 2026-05-12 nightly run on `main`. Both block all GCC builds on older CTKs (12.0, 12.X) for libcu++, and all GCC 8 CUB builds. libcu++: bit_cast<__float128> The memcpy fallback path in `__bit_cast_memcpy` is reached on GCC 8/9/10 because those host compilers don't provide `__builtin_bit_cast`. Commit 8b28e1d (NVIDIA#8265) replaced the previous `is_trivially_default_constructible_v` trait with the C++20-style `default_initializable` concept; the concept's emulation evaluates false for `__float128` in C++17 mode, even though the built-in scalar is in fact default-initializable. Restore an equivalent type trait (`is_default_constructible_v`), which queries the compiler builtin directly and works on every supported GCC. The GCC<=7 gate around the static_assert is no longer needed and is removed. CUB: DeviceReduce reduction_op Commit b60f063 (NVIDIA#8851) merged the three determinism overloads of `reduce_impl` into a single `if constexpr` template. The `__gpu_to_gpu` branch dispatches to RFA, which hardcodes `deterministic_sum_t<accum_t>` internally and does not accept a reduction operator. GCC 8's `-Werror=unused-but-set-parameter` flags `reduction_op` because that branch never references it. The parameter must remain in the signature (the other two branches use it), so the invariant -- enforced upstream by `__transform_reduce`'s `float_double_plus` check -- is acknowledged with `(void) reduction_op` plus a one-line note inside the branch. [skip-vdc][skip-docs][skip-tpt]
Contributor
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Contributor
Author
|
/ok to test 317bcf5 |
miscco
approved these changes
May 12, 2026
fbusato
requested changes
May 12, 2026
Contributor
fbusato
left a comment
There was a problem hiding this comment.
I would prefer #if !_CCCL_COMPILER(GCC, <=, 8). cuda::std::default_initializable is more precise than is_default_constructible
This comment has been minimized.
This comment has been minimized.
`${#ARRAY}` is syntactic sugar for `${#ARRAY[0]}` -- the length of element 0.
Under `set -u`, when an array is empty (no `--ctest-targets`, `--lit-tests`,
etc. on the command line), this errors with "unbound variable" before the
guard can short-circuit. Use `${#ARRAY[@]}` (total element count), which is
defined as 0 on an empty array.
GCC 8 infers noexcept on the generator constructor more permissively
than GCC 9+, so `static_assert(!noexcept(Mask(is_even{})))` fires.
Widen the existing GCC 7 skip to also cover GCC 8.
…luated is faked On GCC 8 there is no __builtin_is_constant_evaluated, so cuda::std::is_constant_evaluated() always returns false. That makes `if (!is_constant_evaluated())` look statically true, so the compiler takes the runtime branch even in a constant expression and trips over the pointer compare, which is not allowed there. Only enable the runtime branch on compilers that actually provide the builtin; everyone else falls through to the constexpr-safe path.
Contributor
Author
|
/ok to test d69de21 |
This comment has been minimized.
This comment has been minimized.
Contributor
|
/ok to test b40e516 |
This comment has been minimized.
This comment has been minimized.
fbusato
requested changes
May 13, 2026
This comment has been minimized.
This comment has been minimized.
Co-authored-by: Federico Busato <50413820+fbusato@users.noreply.github.com>
fbusato
approved these changes
May 14, 2026
Contributor
😬 CI Workflow Results🟥 Finished in 1h 29m: Pass: 15%/500 | Total: 3d 15h | Max: 1h 19m | Hits: 28%/131965See results here. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two unrelated regressions surfaced together in the 2026-05-12 nightly run on
main. Both block all GCC builds on older CTKs (12.0, 12.X) for libcu++, and all GCC 8 CUB builds.libcu++:
bit_cast<__float128>static_assert fires__bit_cast_memcpyis reached on GCC 8/9/10 because those host compilers don't provide__builtin_bit_cast. Commit 8b28e1d (#8265) replacedis_trivially_default_constructible_vwith the C++20-styledefault_initializableconcept; the concept emulation evaluates false for__float128in C++17 mode, even though the built-in scalar is default-initializable.Restore an equivalent type trait (
is_default_constructible_v), which queries the compiler builtin directly and works on every supported GCC. The_CCCL_COMPILER(GCC, <=, 7)gate is no longer needed and is removed.CUB:
DeviceReduce::reduce_implflagsreduction_opas unusedCommit b60f063 (#8851) merged the three determinism overloads into a single
if constexprtemplate. The__gpu_to_gpubranch dispatches to RFA, which hardcodesdeterministic_sum_t<accum_t>internally and does not accept a reduction operator. GCC 8's-Werror=unused-but-set-parameterflagsreduction_opbecause that branch never references it.The parameter must remain in the signature (the other two branches use it). The invariant — enforced upstream by
__transform_reduce'sfloat_double_pluscheck — is acknowledged with(void) reduction_opplus a one-line note inside the branch.CI override
ci/matrix.yamlcarries a temporaryworkflows.overridereproducing the failures on PR CI:Reset to empty before merging.
Test plan
workflows.overrideto empty