Skip to content

Add env overloads for InclusiveSum and future ExclusiveScan#8059

Merged
gonidelis merged 14 commits intoNVIDIA:mainfrom
gonidelis:init_sum_env
Apr 8, 2026
Merged

Add env overloads for InclusiveSum and future ExclusiveScan#8059
gonidelis merged 14 commits intoNVIDIA:mainfrom
gonidelis:init_sum_env

Conversation

@gonidelis
Copy link
Copy Markdown
Member

Adds miscellaneous env overloads for InclusiveSum and ExclusiveScan with FutureValue

@gonidelis gonidelis requested a review from a team as a code owner March 17, 2026 05:40
@gonidelis gonidelis requested a review from pauleonix March 17, 2026 05:40
@github-project-automation github-project-automation Bot moved this to Todo in CCCL Mar 17, 2026
@gonidelis gonidelis changed the title Add ebv overloads for InclusiveSum and future ExclusiveScan Add env overloads for InclusiveSum and future ExclusiveScan Mar 17, 2026
@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Review in CCCL Mar 17, 2026
@github-actions

This comment has been minimized.

Comment thread cub/cub/device/device_scan.cuh Outdated
Comment thread cub/cub/device/device_scan.cuh
Comment thread cub/test/catch2_test_device_scan_env.cu
Comment thread cub/test/catch2_test_device_scan_env_api.cu
@github-actions

This comment has been minimized.

Comment thread cub/cub/device/device_scan.cuh
Comment thread cub/cub/device/device_scan.cuh
Comment thread cub/test/catch2_test_device_scan_env_api.cu Outdated
Comment thread cub/test/catch2_test_device_scan_env_api.cu Outdated
@gonidelis gonidelis enabled auto-merge (squash) March 25, 2026 10:34
Comment thread cub/cub/device/device_scan.cuh Outdated
Comment thread cub/cub/device/device_scan.cuh
Comment thread cub/test/catch2_test_device_scan_env_api.cu Outdated

thrust::device_vector<float> expected{1.0f, 1.0f, 2.0f, 4.0f};
// example-end exclusive-scan-env-stream
stream.sync();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Why do we need a sync here? I don't think we need it.

Applies more often in this file.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initializing expected implicitly syncs, but it is good to be explicit about it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i added for me to remove this discrepancy here. afaiu we should be using stream.sync() but now show it to the users as it should be left on their judgement for when to use it

#8175 (comment)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initializing expected implicitly syncs

But the code here does not rely on this, so it's fine if it wouldn't.

but it is good to be explicit about it.

I disagree. No sync is needed here, so seeing one makes readers like me nervous.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discussed this with @gonidelis and I now understand why we need the sync. We wondered why the code still works without the sync. Where does

Initializing expected implicitly syncs

Is it the memory allocation? I just want to understand why it works without a sync.

We should definitely add the syncs now!

@github-actions

This comment has been minimized.

Comment thread cub/test/catch2_test_device_scan_env.cu Outdated
Comment thread cub/test/catch2_test_device_scan_env.cu Outdated
Comment thread cub/test/catch2_test_device_scan_env.cu Outdated
Comment thread cub/test/catch2_test_device_scan_env_api.cu Outdated
Comment thread cub/test/catch2_test_device_scan_env_api.cu Outdated
@github-actions

This comment has been minimized.

Comment thread cub/test/catch2_test_device_scan_env_api.cu

thrust::device_vector<float> expected{1.0f, 1.0f, 2.0f, 4.0f};
// example-end exclusive-scan-env-stream
stream.sync();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initializing expected implicitly syncs

But the code here does not rely on this, so it's fine if it wouldn't.

but it is good to be explicit about it.

I disagree. No sync is needed here, so seeing one makes readers like me nervous.

Comment thread cub/cub/device/device_scan.cuh
Comment thread cub/cub/device/device_scan.cuh Outdated
Comment thread cub/cub/device/device_scan.cuh
Comment thread cub/cub/device/device_scan.cuh Outdated
gonidelis added 12 commits April 8, 2026 07:29
  - dispatch_scan's warpspeed path calls max_dynamic_smem_size_for and
    set_max_dynamic_smem_size_for on the launcher factory
  - Previously untriggered because env tests used constant_iterator (non-contiguous),
    which skips the warpspeed path
  - Now needed for InclusiveSum env tests with device_vector (contiguous iterators)
…ents

  - Add default environment test for ExclusiveScan with FutureValue
  - Add not_guaranteed determinism test for ExclusiveScan
  - Remove duplicate in-place precondition from env overload descriptions (already in Preconditions section)
  - Remove @devicestorage from env overloads (no temp storage parameter)
…ents

  - Use non-identity init values
  - Replace per-element REQUIRE(d_out[i]) with bulk thrust::equal or
    device_vector comparison where possible
  - Add example-begin/end markers for exclusive-scan-env-not-guaranteed
  - Fix mismatched init/expected in inclusive-scan-init env test
Comment thread cub/test/catch2_test_device_scan_env.cu Outdated
Comment thread cub/test/catch2_test_device_scan_env.cu Outdated
Comment thread cub/test/catch2_test_device_scan_env.cu Outdated
Comment thread cub/test/catch2_test_device_scan_env.cu

thrust::device_vector<float> expected{1.0f, 1.0f, 2.0f, 4.0f};
// example-end exclusive-scan-env-stream
stream.sync();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discussed this with @gonidelis and I now understand why we need the sync. We wondered why the code still works without the sync. Where does

Initializing expected implicitly syncs

Is it the memory allocation? I just want to understand why it works without a sync.

We should definitely add the syncs now!

@github-actions

This comment has been minimized.

@bernhardmgruber
Copy link
Copy Markdown
Contributor

There are errors in the docs build:

/home/runner/_work/cccl/cccl/cub/cub/device/device_scan.cuh:348: error: explicit link request to 'cuda::std::execution::env' could not be resolved
/home/runner/_work/cccl/cccl/cub/cub/device/device_scan.cuh:654: error: explicit link request to 'cuda::std::execution::env' could not be resolved
/home/runner/_work/cccl/cccl/cub/cub/device/device_scan.cuh:1123: error: explicit link request to 'cuda::std::execution::env' could not be resolved
/home/runner/_work/cccl/cccl/cub/cub/device/device_scan.cuh:1393: error: explicit link request to 'cuda::std::execution::env' could not be resolved
/home/runner/_work/cccl/cccl/cub/cub/device/device_scan.cuh:1782: error: explicit link request to 'cuda::std::execution::env' could not be resolved
/home/runner/_work/cccl/cccl/cub/cub/device/device_scan.cuh:1867: error: explicit link request to 'cuda::std::execution::env' could not be resolved
/home/runner/_work/cccl/cccl/cub/cub/device/device_scan.cuh:2603: error: explicit link request to 'cuda::std::execution::env' could not be resolved
/home/runner/_work/cccl/cccl/cub/cub/device/device_scan.cuh:2732: error: explicit link request to 'cuda::std::execution::env' could not be resolved
/home/runner/_work/cccl/cccl/cub/cub/device/device_scan.cuh:2860: error: explicit link request to 'cuda::std::execution::env' could not be resolved
/home/runner/_work/cccl/cccl/cub/cub/device/device_scan.cuh:2981: error: explicit link request to 'cuda::std::execution::env' could not be resolved

Please address those.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 8, 2026

🥳 CI Workflow Results

🟩 Finished in 1h 26m: Pass: 100%/269 | Total: 3d 15h | Max: 1h 07m | Hits: 98%/177053

See results here.

@gonidelis gonidelis merged commit 8d77295 into NVIDIA:main Apr 8, 2026
285 of 288 checks passed
@github-project-automation github-project-automation Bot moved this from In Review to Done in CCCL Apr 8, 2026
jainishmehta pushed a commit to jainishmehta/cccl that referenced this pull request Apr 19, 2026
)

* Add ebv overloads for InclusiveSum and future ExclusiveScan

* Add dynamic smem methods to stream_registry_factory_t and tidy-up tests

  - dispatch_scan's warpspeed path calls max_dynamic_smem_size_for and
    set_max_dynamic_smem_size_for on the launcher factory
  - Previously untriggered because env tests used constant_iterator (non-contiguous),
    which skips the warpspeed path
  - Now needed for InclusiveSum env tests with device_vector (contiguous iterators)

* Improve DeviceScan env test coverage and clean up duplicate doc comments

  - Add default environment test for ExclusiveScan with FutureValue
  - Add not_guaranteed determinism test for ExclusiveScan
  - Remove duplicate in-place precondition from env overload descriptions (already in Preconditions section)
  - Remove @devicestorage from env overloads (no temp storage parameter)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

3 participants