Skip to content

Final env-passthrough 2/4#8979

Open
gonidelis wants to merge 7 commits into
NVIDIA:mainfrom
gonidelis:env_passthrough_2
Open

Final env-passthrough 2/4#8979
gonidelis wants to merge 7 commits into
NVIDIA:mainfrom
gonidelis:env_passthrough_2

Conversation

@gonidelis
Copy link
Copy Markdown
Member

Handles #8175 for

  • DeviceAdjacentDifference
  • DeviceCopy
  • DevicePartition
  • DeviceRunLengthEncode
  • DeviceFind

the most serious issues have to do with defaults in adjacentdifference and ambiguities they were created.

I chose to start introducing non-env api file tests to check that the non-env APIs work in their minimal form. That is when default args are not explicitly passed. These files will facilitate as ground for extracting the example snippets from non env overloads from doxygen to literalincludes later on.

gonidelis added 7 commits May 13, 2026 20:16
* add in docs that memory_resource can also be passed in env
* add enable_ifs to beat existing ambiguities with non-env APIs
* match 1-1 the non-env APIs to the env APIs identities (defaults, arguments order)
* its purpose is to guard against ambiguities introduced with env algorithms
* it also extends as a ground for placing the literalinclude examples as we extract them
  from the hardcoded snippets in the docs
@gonidelis gonidelis requested a review from a team as a code owner May 14, 2026 03:21
@gonidelis gonidelis requested a review from elstehle May 14, 2026 03:21
@github-project-automation github-project-automation Bot moved this to Todo in CCCL May 14, 2026
@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Review in CCCL May 14, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1e116557-765c-4e11-80cb-33c20983c0b3

📥 Commits

Reviewing files that changed from the base of the PR and between 5d79bc2 and acfdd96.

📒 Files selected for processing (13)
  • cub/cub/device/device_adjacent_difference.cuh
  • cub/cub/device/device_copy.cuh
  • cub/cub/device/device_find.cuh
  • cub/cub/device/device_partition.cuh
  • cub/cub/device/device_run_length_encode.cuh
  • cub/test/catch2_test_device_adjacent_difference_api.cu
  • cub/test/catch2_test_device_copy_api.cu
  • cub/test/catch2_test_device_copy_env_api.cu
  • cub/test/catch2_test_device_find_api.cu
  • cub/test/catch2_test_device_partition_api.cu
  • cub/test/catch2_test_device_partition_env_api.cu
  • cub/test/catch2_test_device_run_length_encode_api.cu
  • cub/test/catch2_test_device_run_length_encode_env_api.cu

📝 Walkthrough

Summary by CodeRabbit

  • Refactor

    • Simplified template constraints in device partition and run-length encode APIs.
    • Updated adjacent difference template constraints for improved type checking.
  • Documentation

    • Clarified adjacent difference memory resource customization options.
    • Enhanced find API documentation with explicit search semantics.
    • Improved partitioning API documentation formatting and parameter descriptions.
    • Refined run-length encode parameter and template documentation.
    • Removed determinism guarantees from copy API documentation.
  • Tests

    • Added overload resolution validation tests for adjacent difference, copy, find, partition, and run-length encode APIs.
    • Updated environment-based API tests to use direct stream references.

Walkthrough

Five core CUB device algorithms simplify their environment-based API template constraints by replacing iterator-type SFINAE guards with integral NumItemsT requirements. Documentation is clarified for bounds semantics and parameter descriptions. Comprehensive test suites validate legacy overload resolution without explicit streams, and existing environment tests refactor to pass cuda::stream_ref directly.

Changes

Environment API constraint and documentation updates

Layer / File(s) Summary
DeviceAdjacentDifference SFINAE refactoring and doc updates
cub/cub/device/device_adjacent_difference.cuh
Type-trait includes enable_if and is_integral added. Four environment overloads change SFINAE from checking iterator type != void* to requiring NumItemsT integral. Iterator parameters marked [inferred]; environment docs expanded for memory resource customization.
DeviceCopy documentation fixes
cub/cub/device/device_copy.cuh
Remove gpu_to_gpu determinism claim from Batched and mdspan Copy environment overload docs. Mdspan description now follows environment customization bullets directly.
DeviceFind bounds search documentation clarification
cub/cub/device/device_find.cuh
LowerBound and UpperBound docs explicitly state returned iterator is first element meeting ordered/not-ordered condition. FindIf overview bullet list reflowed.
DevicePartition SFINAE removal and doc updates
cub/cub/device/device_partition.cuh
Include set adjusted: enable_if/is_same removed, cstdint added. Three environment overloads (Flagged, unary If, three-way If) remove trailing enable_if_t defaults. Versionadded markers added for environment overloads. Three-way If doc refactored with notes on output counts and non-overlap.
DeviceRunLengthEncode SFINAE removal and parameter docs
cub/cub/device/device_run_length_encode.cuh
Type-trait includes enable_if and is_same removed. Two environment overloads (Encode, NonTrivialRuns) remove trailing enable_if_t defaults. NumItemsT and num_items docs added/corrected for total input item count.

Legacy API overload resolution test coverage

Layer / File(s) Summary
DeviceAdjacentDifference legacy overload test
cub/test/catch2_test_device_adjacent_difference_api.cu
Four test cases verify legacy size-query calls for SubtractLeftCopy, SubtractLeft, SubtractRightCopy, SubtractRight resolve unambiguously without explicit stream.
DeviceCopy::Batched legacy overload test
cub/test/catch2_test_device_copy_api.cu
Test verifies Batched legacy temp-storage size-query with iterator-of-iterators placeholders resolves unambiguously.
DeviceFind legacy overload tests
cub/test/catch2_test_device_find_api.cu
Three test cases added for FindIf, LowerBound, UpperBound legacy size-query calls ensuring unambiguous dispatch without stream.
DevicePartition legacy overload tests
cub/test/catch2_test_device_partition_api.cu
Predicate functor and three tests for Flagged and If (unary and three-way) legacy size-query calls verify unambiguous dispatch.
DeviceRunLengthEncode legacy overload tests
cub/test/catch2_test_device_run_length_encode_api.cu
Two test cases for Encode and NonTrivialRuns legacy size-query calls assert successful unambiguous dispatch.

Environment API test refactoring for stream_ref direct usage

Layer / File(s) Summary
DeviceCopy environment API test refactoring
cub/test/catch2_test_device_copy_env_api.cu
Batched and Copy environment tests refactored to pass stream_ref directly; stream.sync() added after operations.
DevicePartition environment API test refactoring
cub/test/catch2_test_device_partition_env_api.cu
If and Flagged environment tests refactored to pass stream_ref directly; stream.sync() added before assertions.
DeviceRunLengthEncode environment API test refactoring
cub/test/catch2_test_device_run_length_encode_env_api.cu
Encode and NonTrivialRuns environment tests refactored to pass stream_ref directly; stream.sync() added before validation.

Suggested reviewers

  • bdice
  • Jacobfaib
  • miscco

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

@gonidelis gonidelis requested a review from bernhardmgruber May 14, 2026 05:00
@github-actions
Copy link
Copy Markdown
Contributor

😬 CI Workflow Results

🟥 Finished in 1h 41m: Pass: 13%/283 | Total: 5d 05h | Max: 1h 40m | Hits: 7%/240102

See results here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

1 participant