Add env overloads for InclusiveSum and future ExclusiveScan#8059
Add env overloads for InclusiveSum and future ExclusiveScan#8059gonidelis merged 14 commits intoNVIDIA:mainfrom
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
|
||
| thrust::device_vector<float> expected{1.0f, 1.0f, 2.0f, 4.0f}; | ||
| // example-end exclusive-scan-env-stream | ||
| stream.sync(); |
There was a problem hiding this comment.
Q: Why do we need a sync here? I don't think we need it.
Applies more often in this file.
There was a problem hiding this comment.
Initializing expected implicitly syncs, but it is good to be explicit about it.
There was a problem hiding this comment.
i added for me to remove this discrepancy here. afaiu we should be using stream.sync() but now show it to the users as it should be left on their judgement for when to use it
There was a problem hiding this comment.
Initializing
expectedimplicitly syncs
But the code here does not rely on this, so it's fine if it wouldn't.
but it is good to be explicit about it.
I disagree. No sync is needed here, so seeing one makes readers like me nervous.
There was a problem hiding this comment.
I discussed this with @gonidelis and I now understand why we need the sync. We wondered why the code still works without the sync. Where does
Initializing expected implicitly syncs
Is it the memory allocation? I just want to understand why it works without a sync.
We should definitely add the syncs now!
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
|
||
| thrust::device_vector<float> expected{1.0f, 1.0f, 2.0f, 4.0f}; | ||
| // example-end exclusive-scan-env-stream | ||
| stream.sync(); |
There was a problem hiding this comment.
Initializing
expectedimplicitly syncs
But the code here does not rely on this, so it's fine if it wouldn't.
but it is good to be explicit about it.
I disagree. No sync is needed here, so seeing one makes readers like me nervous.
- dispatch_scan's warpspeed path calls max_dynamic_smem_size_for and
set_max_dynamic_smem_size_for on the launcher factory
- Previously untriggered because env tests used constant_iterator (non-contiguous),
which skips the warpspeed path
- Now needed for InclusiveSum env tests with device_vector (contiguous iterators)
…ents - Add default environment test for ExclusiveScan with FutureValue - Add not_guaranteed determinism test for ExclusiveScan - Remove duplicate in-place precondition from env overload descriptions (already in Preconditions section) - Remove @devicestorage from env overloads (no temp storage parameter)
…ents
- Use non-identity init values
- Replace per-element REQUIRE(d_out[i]) with bulk thrust::equal or
device_vector comparison where possible
- Add example-begin/end markers for exclusive-scan-env-not-guaranteed
- Fix mismatched init/expected in inclusive-scan-init env test
|
|
||
| thrust::device_vector<float> expected{1.0f, 1.0f, 2.0f, 4.0f}; | ||
| // example-end exclusive-scan-env-stream | ||
| stream.sync(); |
There was a problem hiding this comment.
I discussed this with @gonidelis and I now understand why we need the sync. We wondered why the code still works without the sync. Where does
Initializing expected implicitly syncs
Is it the memory allocation? I just want to understand why it works without a sync.
We should definitely add the syncs now!
This comment has been minimized.
This comment has been minimized.
|
There are errors in the docs build: Please address those. |
🥳 CI Workflow Results🟩 Finished in 1h 26m: Pass: 100%/269 | Total: 3d 15h | Max: 1h 07m | Hits: 98%/177053See results here. |
) * Add ebv overloads for InclusiveSum and future ExclusiveScan * Add dynamic smem methods to stream_registry_factory_t and tidy-up tests - dispatch_scan's warpspeed path calls max_dynamic_smem_size_for and set_max_dynamic_smem_size_for on the launcher factory - Previously untriggered because env tests used constant_iterator (non-contiguous), which skips the warpspeed path - Now needed for InclusiveSum env tests with device_vector (contiguous iterators) * Improve DeviceScan env test coverage and clean up duplicate doc comments - Add default environment test for ExclusiveScan with FutureValue - Add not_guaranteed determinism test for ExclusiveScan - Remove duplicate in-place precondition from env overload descriptions (already in Preconditions section) - Remove @devicestorage from env overloads (no temp storage parameter)
Adds miscellaneous env overloads for
InclusiveSumandExclusiveScanwithFutureValue