-
Notifications
You must be signed in to change notification settings - Fork 392
[STF] Add per-handle exec_place stream resources #8905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
755d93a
[cudax] Add per-handle exec_place stream resources
caugonnet 534f80a
Address CodeRabbit review comments
caugonnet f3168bc
Expose exec place resources in the C STF API
caugonnet 7819653
Merge branch 'main' into exec-place-resources
caugonnet 863fdf8
Clarify C exec place resource ownership
caugonnet 12c5ee9
Merge branch 'main' into exec-place-resources
caugonnet 3503af8
Improved comment
caugonnet 62cfcca
Merge branch 'main' into exec-place-resources
caugonnet f8db8e2
Merge branch 'main' into exec-place-resources
caugonnet 475b436
Merge branch 'main' into exec-place-resources
caugonnet a54d670
[STF] Fix pick_all_streams docs signature
caugonnet 72143a7
Merge branch 'main' into exec-place-resources
caugonnet File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
134 changes: 134 additions & 0 deletions
134
cudax/include/cuda/experimental/__places/exec_place_resources.cuh
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,134 @@ | ||
| //===----------------------------------------------------------------------===// | ||
| // | ||
| // Part of CUDASTF in CUDA C++ Core Libraries, | ||
| // under the Apache License v2.0 with LLVM Exceptions. | ||
| // See https://llvm.org/LICENSE.txt for license information. | ||
| // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
| // SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. | ||
| // | ||
| //===----------------------------------------------------------------------===// | ||
|
|
||
| /** | ||
| * @file | ||
| * @brief Standalone per-place stream-pool registry. | ||
| * | ||
| * `exec_place_resources` owns a `{compute, data}` `stream_pool` slot for every | ||
| * pooled place it is queried with. Slots are created lazily on first use and | ||
| * destroyed with the registry. The registry depends only on `stream_pool.cuh` | ||
| * and a forward declaration of `exec_place`; it can be embedded in any | ||
| * resource container (e.g. `async_resources_handle`) without pulling in STF. | ||
| * | ||
| * Keys are `exec_place::impl*` pointers. Pooled implementations (`device(N)`, | ||
| * `host()`) live as process-wide singleton impls, so pointer identity matches | ||
| * place identity for them. Self-contained implementations (`cuda_stream`, | ||
| * green-context, grid) override `get_stream_pool` and never reach the | ||
| * registry. | ||
| */ | ||
|
|
||
| #pragma once | ||
|
|
||
| #include <cuda/__cccl_config> | ||
|
|
||
| #if defined(_CCCL_IMPLICIT_SYSTEM_HEADER_GCC) | ||
| # pragma GCC system_header | ||
| #elif defined(_CCCL_IMPLICIT_SYSTEM_HEADER_CLANG) | ||
| # pragma clang system_header | ||
| #elif defined(_CCCL_IMPLICIT_SYSTEM_HEADER_MSVC) | ||
| # pragma system_header | ||
| #endif // no system header | ||
|
|
||
| #include <cuda/experimental/__places/stream_pool.cuh> | ||
|
|
||
| #include <mutex> | ||
| #include <unordered_map> | ||
|
|
||
| namespace cuda::experimental::places | ||
| { | ||
| /** | ||
| * @brief Default size of each per-place stream pool created by the registry. | ||
| * | ||
| * `exec_place::impl::pool_size` and `data_pool_size` are aliases to these | ||
| * values so `places.cuh` can keep its public surface unchanged. | ||
| */ | ||
| inline constexpr ::std::size_t exec_place_default_pool_size = 4; | ||
| inline constexpr ::std::size_t exec_place_default_data_pool_size = 4; | ||
|
|
||
| /** | ||
| * @brief A registry of per-place stream pools keyed by `exec_place::impl*`. | ||
| * | ||
| * For every distinct pooled impl pointer the registry is queried with, it | ||
| * owns one `{compute, data}` pair of `stream_pool`s, created lazily on first | ||
| * lookup with sizes `exec_place_default_pool_size` / | ||
| * `exec_place_default_data_pool_size`. | ||
| * | ||
| * The map itself is mutex-guarded. The mutex is only held across the | ||
| * find/insert into the map; subsequent stream creation (which happens lazily | ||
| * inside `stream_pool::next`) runs outside the lock, so contention is limited | ||
| * to slow-path task submission. | ||
| * | ||
| * Lifetime: each entry's pool is owned by the registry. Destroying the | ||
| * registry destroys every pool it has created (and their cached | ||
| * `cudaStream_t` handles). Consequently, a registry must not outlive the | ||
| * CUDA primary context(s) of the devices it has cached streams for; with | ||
| * this design, registries are typically embedded in an | ||
| * `async_resources_handle` and share the lifetime of the owning STF context. | ||
| * | ||
| * Caveats for externally-owned places: | ||
| * - User-stream places (`exec_place::cuda_stream(s)`) carry their own | ||
| * single-stream pool and never participate in the registry. | ||
| * - Green-context places carry their own pool (constructed from the | ||
| * `green_ctx_view`) and also bypass the registry. The user must keep the | ||
| * underlying `CUgreenCtx` alive as long as the place is used. | ||
| */ | ||
| class exec_place_resources | ||
| { | ||
| public: | ||
| struct per_place_pools | ||
| { | ||
| per_place_pools() | ||
| : compute(exec_place_default_pool_size) | ||
| , data(exec_place_default_data_pool_size) | ||
| {} | ||
|
caugonnet marked this conversation as resolved.
|
||
|
|
||
| stream_pool compute; | ||
| stream_pool data; | ||
| }; | ||
|
|
||
| exec_place_resources() = default; | ||
|
|
||
| exec_place_resources(const exec_place_resources&) = delete; | ||
| exec_place_resources& operator=(const exec_place_resources&) = delete; | ||
| exec_place_resources(exec_place_resources&&) = delete; | ||
| exec_place_resources& operator=(exec_place_resources&&) = delete; | ||
|
|
||
| /** | ||
| * @brief Look up (or lazily create) the `{compute, data}` pool slot for the | ||
| * supplied impl pointer. | ||
| * | ||
| * Thread-safe: the mutex is held only across the find/insert. The returned | ||
| * reference is stable for the lifetime of the registry (`std::unordered_map` | ||
| * preserves node addresses across rehashes). | ||
| */ | ||
| [[nodiscard]] per_place_pools& get(const void* impl_key) | ||
| { | ||
| ::std::lock_guard<::std::mutex> lock(mtx_); | ||
| auto it = map_.find(impl_key); | ||
| if (it == map_.end()) | ||
| { | ||
| it = map_.emplace(impl_key, per_place_pools{}).first; | ||
| } | ||
| return it->second; | ||
| } | ||
|
|
||
| /// @brief Number of per-place entries currently cached. Mainly for tests. | ||
| [[nodiscard]] ::std::size_t size() const | ||
| { | ||
| ::std::lock_guard<::std::mutex> lock(mtx_); | ||
| return map_.size(); | ||
| } | ||
|
|
||
| private: | ||
| mutable ::std::mutex mtx_; | ||
| ::std::unordered_map<const void*, per_place_pools> map_; | ||
| }; | ||
| } // namespace cuda::experimental::places | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.