Skip to content

[STF] Add per-handle exec_place stream resources#8905

Open
caugonnet wants to merge 8 commits into
NVIDIA:mainfrom
caugonnet:exec-place-resources
Open

[STF] Add per-handle exec_place stream resources#8905
caugonnet wants to merge 8 commits into
NVIDIA:mainfrom
caugonnet:exec-place-resources

Conversation

@caugonnet
Copy link
Copy Markdown
Contributor

@caugonnet caugonnet commented May 12, 2026

Move pooled exec_place streams into an explicit resources registry so STF contexts own their cached streams and avoid stale handles after CUDA context teardown.

Description

closes

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Move pooled exec_place streams into an explicit resources registry so STF contexts own their cached streams and avoid stale handles after CUDA context teardown.
@caugonnet caugonnet self-assigned this May 12, 2026
@caugonnet caugonnet added stf Sequential Task Flow programming model places labels May 12, 2026
@github-project-automation github-project-automation Bot moved this to Todo in CCCL May 12, 2026
@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot Bot commented May 12, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Progress in CCCL May 12, 2026
@caugonnet
Copy link
Copy Markdown
Contributor Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 12, 2026

✅ Actions performed

Full review triggered.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 12, 2026

Review Change Stack

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Introduced caller-owned exec-place resources registry and new APIs/overloads to use it (including C API handles).
  • Refactor

    • Moved pooled streams out of per-place storage into the external registry; stream-access APIs now accept resource handles.
  • Bug Fixes

    • Stream validation now detects invalid CUDA contexts and resets stale streams.
  • Documentation

    • Updated stream-management docs to explain registry model and lifetime rules.
  • Tests

    • Added isolation and reset-recovery tests; updated existing tests to the new registry APIs.

suggestion: This PR refactors CUDA stream pool lifecycle from per-exec_place internal storage to an external exec_place_resources registry. Stream pools are now lazily created and scoped to the registry lifetime, enabling per-handle resource management while preserving embedded pools for self-contained places. All internal callsites are updated to thread the registry through stream-selection methods.

suggestion:

Stream Pool Registry Refactor

Layer / File(s) Summary
Registry Infrastructure & Virtual Interface
cudax/include/cuda/experimental/__places/exec_place_resources.cuh, cudax/include/cuda/experimental/__places/places.cuh
New exec_place_resources class provides thread-safe lazy-loaded registry mapping implementations to {compute, data} stream-pool pairs. Virtual method exec_place::impl::get_stream_pool signature updated to accept registry and self reference; forward declaration of async_resources_handle added to decouple header dependencies.
Public API Overloads & Stream Validation
cudax/include/cuda/experimental/__places/places.cuh
Adds registry-aware overloads to exec_place for getStream, get_stream_pool, pick_stream, stream_pool_size, pick_all_streams (accepting either exec_place_resources& or async_resources_handle&). Introduces data_place::getDataStream(exec_place_resources&) for affinity-aware data streams. stream_pool::next now validates cached streams' CUDA context.
Self-contained & Composite Place Implementations
cudax/include/cuda/experimental/__places/exec/cuda_stream.cuh, cudax/include/cuda/experimental/__places/exec/green_context.cuh, cudax/include/cuda/experimental/__places/places.cuh
Updates exec_place_cuda_stream_impl and exec_place_green_ctx_impl to new signature; they ignore the registry and return embedded pools. Removes device-place member storage of stream pools; host/grid impls borrow/forward pools from the registry.
CUDASTF Integration: Registry Ownership & Callsite Threading
cudax/include/cuda/experimental/__stf/internal/async_resources_handle.cuh, cudax/include/cuda/experimental/__stf/internal/stf_places_extended_exports.cuh, cudax/include/cuda/experimental/__stf/internal/backend_ctx.cuh, cudax/include/cuda/experimental/__places/place_partition.cuh, cudax/include/cuda/experimental/__stf/graph/graph_task.cuh, cudax/include/cuda/experimental/__stf/stream/interfaces/slice.cuh, cudax/include/cuda/experimental/__stf/stream/internal/event_types.cuh, cudax/include/cuda/experimental/__stf/stream/reduction.cuh, cudax/include/cuda/experimental/__stf/stream/stream_ctx.cuh, cudax/include/cuda/experimental/__stf/stream/stream_task.cuh
async_resources_handle now owns exec_place_resources and exposes get_place_resources(); convenience overloads in cuda::experimental::places route through the handle. All CUDASTF callsites updated to fetch async_resources().get_place_resources() and pass it explicitly to stream-selection APIs. Registry re-exported into STF namespace.
C API Bindings & Tests
c/experimental/stf/include/cccl/c/experimental/stf/stf.h, c/experimental/stf/src/stf.cu, c/experimental/stf/test/test_places.cpp
Adds opaque stf_exec_place_resources_handle and C API create/destroy/pick/context-borrow functions; extends opaque conversions; adds Catch2 tests for standalone, isolated, and context-borrowed registries.
Test Coverage & Documentation
cudax/test/places/stream_pool.cu, cudax/test/stf/CMakeLists.txt, cudax/test/stf/cpp/test_pick_stream.cu, cudax/test/stf/cpp/test_pick_stream_green_context.cu, docs/cudax/places.rst
Updates tests to construct and pass exec_place_resources registries; adds isolation and reset-resilience cases; registers two new STF test sources; updates docs with registry-based examples and lifetime notes.

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a7a53086-da15-44b1-9f06-0ee688df1ddc

📥 Commits

Reviewing files that changed from the base of the PR and between a1f3aeb and 755d93a.

📒 Files selected for processing (19)
  • cudax/include/cuda/experimental/__places/exec/cuda_stream.cuh
  • cudax/include/cuda/experimental/__places/exec/green_context.cuh
  • cudax/include/cuda/experimental/__places/exec_place_resources.cuh
  • cudax/include/cuda/experimental/__places/place_partition.cuh
  • cudax/include/cuda/experimental/__places/places.cuh
  • cudax/include/cuda/experimental/__stf/graph/graph_task.cuh
  • cudax/include/cuda/experimental/__stf/internal/async_resources_handle.cuh
  • cudax/include/cuda/experimental/__stf/internal/backend_ctx.cuh
  • cudax/include/cuda/experimental/__stf/internal/stf_places_extended_exports.cuh
  • cudax/include/cuda/experimental/__stf/stream/interfaces/slice.cuh
  • cudax/include/cuda/experimental/__stf/stream/internal/event_types.cuh
  • cudax/include/cuda/experimental/__stf/stream/reduction.cuh
  • cudax/include/cuda/experimental/__stf/stream/stream_ctx.cuh
  • cudax/include/cuda/experimental/__stf/stream/stream_task.cuh
  • cudax/test/places/stream_pool.cu
  • cudax/test/stf/CMakeLists.txt
  • cudax/test/stf/cpp/test_pick_stream.cu
  • cudax/test/stf/cpp/test_pick_stream_green_context.cu
  • docs/cudax/places.rst

Comment thread cudax/include/cuda/experimental/__places/exec_place_resources.cuh
Comment thread cudax/include/cuda/experimental/__places/exec_place_resources.cuh Outdated
Comment thread cudax/include/cuda/experimental/__places/exec_place_resources.cuh Outdated
Comment thread cudax/include/cuda/experimental/__places/places.cuh Outdated
Comment on lines +328 to +332
inline cudaStream_t
exec_place::pick_stream(::cuda::experimental::stf::async_resources_handle& h, bool for_computation) const
{
return pick_stream(h.get_place_resources(), for_computation);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

critical: Add required API annotation and [[nodiscard]] attribute.

-inline cudaStream_t
-exec_place::pick_stream(::cuda::experimental::stf::async_resources_handle& h, bool for_computation) const
+[[nodiscard]] inline _CCCL_HOST_API cudaStream_t
+exec_place::pick_stream(::cuda::experimental::stf::async_resources_handle& h, bool for_computation) const

As per coding guidelines: All functions must have API annotations, and non-void returns should be [[nodiscard]].

Comment on lines +334 to +337
inline size_t exec_place::stream_pool_size(::cuda::experimental::stf::async_resources_handle& h) const
{
return stream_pool_size(h.get_place_resources());
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

critical: Add required API annotation and [[nodiscard]] attribute.

-inline size_t exec_place::stream_pool_size(::cuda::experimental::stf::async_resources_handle& h) const
+[[nodiscard]] inline _CCCL_HOST_API ::std::size_t exec_place::stream_pool_size(::cuda::experimental::stf::async_resources_handle& h) const

Also fully qualify size_t as ::std::size_t.

As per coding guidelines: All functions must have API annotations, non-void returns should be [[nodiscard]], and standard types must be fully qualified.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
inline size_t exec_place::stream_pool_size(::cuda::experimental::stf::async_resources_handle& h) const
{
return stream_pool_size(h.get_place_resources());
}
[[nodiscard]] inline _CCCL_HOST_API ::std::size_t exec_place::stream_pool_size(::cuda::experimental::stf::async_resources_handle& h) const
{
return stream_pool_size(h.get_place_resources());
}

Comment thread cudax/include/cuda/experimental/__stf/internal/async_resources_handle.cuh Outdated
Comment thread cudax/include/cuda/experimental/__stf/stream/reduction.cuh Outdated
Comment thread docs/cudax/places.rst Outdated
caugonnet and others added 3 commits May 12, 2026 11:34
- Add [[nodiscard]] to non-void returns on exec_place_resources::get/size,
  the virtual get_stream_pool, async_resources_handle::get_place_resources,
  and the 5 async_resources_handle& convenience overloads on exec_place.
- Mark dstream const in stream_reduction_operator op/init overrides.
- Clarify pick_stream() call form in docs/cudax/places.rst.
Add the C entry points needed to share per-place stream pools between standalone place code and STF contexts.
@caugonnet
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 12, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9c5fb806-40f1-4b93-a2ec-4ee222462b9e

📥 Commits

Reviewing files that changed from the base of the PR and between 755d93a and 7819653.

📒 Files selected for processing (8)
  • c/experimental/stf/include/cccl/c/experimental/stf/stf.h
  • c/experimental/stf/src/stf.cu
  • c/experimental/stf/test/test_places.cpp
  • cudax/include/cuda/experimental/__places/exec_place_resources.cuh
  • cudax/include/cuda/experimental/__places/places.cuh
  • cudax/include/cuda/experimental/__stf/internal/async_resources_handle.cuh
  • cudax/include/cuda/experimental/__stf/stream/reduction.cuh
  • docs/cudax/places.rst
✅ Files skipped from review due to trivial changes (1)
  • docs/cudax/places.rst
🚧 Files skipped from review as they are similar to previous changes (4)
  • cudax/include/cuda/experimental/__stf/stream/reduction.cuh
  • cudax/include/cuda/experimental/__stf/internal/async_resources_handle.cuh
  • cudax/include/cuda/experimental/__places/exec_place_resources.cuh
  • cudax/include/cuda/experimental/__places/places.cuh

Comment thread c/experimental/stf/src/stf.cu
caugonnet and others added 2 commits May 12, 2026 14:57
Track whether C resource handles own their underlying registry so borrowed context resources can be released without corrupting context teardown.
@caugonnet
Copy link
Copy Markdown
Contributor Author

/ok to test 12c5ee9

@github-actions
Copy link
Copy Markdown
Contributor

🥳 CI Workflow Results

🟩 Finished in 3h 39m: Pass: 100%/59 | Total: 1d 13h | Max: 1h 21m | Hits: 7%/255304

See results here.

@caugonnet caugonnet marked this pull request as ready for review May 13, 2026 08:15
@caugonnet caugonnet requested review from a team as code owners May 13, 2026 08:15
@caugonnet caugonnet requested a review from alliepiper May 13, 2026 08:15
@caugonnet caugonnet requested a review from griwes May 13, 2026 08:15
@cccl-authenticator-app cccl-authenticator-app Bot moved this from In Progress to In Review in CCCL May 13, 2026
Comment thread c/experimental/stf/src/stf.cu
Copy link
Copy Markdown
Contributor

@NaderAlAwar NaderAlAwar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C part looks good

Copy link
Copy Markdown
Contributor

@andralex andralex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

places stf Sequential Task Flow programming model

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

4 participants