-
Notifications
You must be signed in to change notification settings - Fork 279
Add managed-memory advise, prefetch, and discard-prefetch free functions #1775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
rparolin
wants to merge
75
commits into
NVIDIA:main
Choose a base branch
from
rparolin:rparolin/managed_mem_advise_prefetch
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
75 commits
Select commit
Hold shift + click to select a range
abdec47
wip
rparolin c418050
wip
rparolin b879fa5
fixing ci compiler errors
rparolin 04ee3de
skipping tests that aren't supported
rparolin 9ab3f46
cu12 support
rparolin bd75bc3
Merge branch 'main' into rparolin/managed_mem_advise_prefetch
rparolin 1b1343b
Merge branch 'main' into rparolin/managed_mem_advise_prefetch
rparolin a948066
Moving to function from Buffer class methods to free standing functio…
rparolin 1457599
precommit format
rparolin acb4024
iterating on implementation
rparolin d10ab07
Simplify managed-memory helpers: remove long-form aliases, cache look…
rparolin ae1de36
Merge branch 'main' into rparolin/managed_mem_advise_prefetch
rparolin c250c92
fix(test): reset _V2_BINDINGS cache so legacy-signature tests take th…
rparolin 89329d9
fix(test): require concurrent_managed_access for advise tests that hi…
rparolin 8a75d1b
fix: validate managed buffer before checking discard_prefetch binding…
rparolin 9e9b1e0
refactor: extract managed memory ops into dedicated _managed_memory_o…
rparolin 90f0711
pre-commit fix
rparolin b4d252c
Removing blank file
rparolin faaa1d8
wip
rparolin 18786be
Merge branch 'main' into rparolin/managed_mem_advise_prefetch
rparolin 9766ddc
Merge remote-tracking branch 'upstream/main' into rparolin/managed_me…
rparolin cf2f20d
fix(cuda.core): update binding_version import after upstream merge
rparolin db3bac2
revert: drop managed_memory shim in cuda.core.experimental
rparolin 20d036e
feat(cuda.core): add Location dataclass for managed memory
rparolin c2dae53
feat(cuda.core): add _coerce_location helper
rparolin 935c8ba
test(cuda.core): update monkeypatch target after binding_version rename
rparolin dc46535
refactor(cuda.core): tighten memory-attr query
rparolin 818f5d2
feat(cuda.core): unified 1..N managed_memory.prefetch with cydriver
rparolin e296e72
feat(cuda.core): add managed_memory.discard
rparolin e697131
feat(cuda.core): unified 1..N managed_memory.discard_prefetch with cy…
rparolin 3bc1021
feat(cuda.core): unified 1..N managed_memory.advise + drop legacy app…
rparolin fa23869
refactor(cuda.core): use Buffer.is_managed property in managed_memory…
rparolin 68bdd14
docs(cuda.core): document Location, discard, and 1..N managed_memory ops
rparolin b4d9cbf
chore(cuda.core): drop narrative comments and tighten _coerce_locatio…
rparolin ee96758
chore(cuda.core): satisfy pre-commit hooks
rparolin d6f60f2
refactor(cuda.core): move managed_memory ops to cuda.core.utils
rparolin 3176271
chore(cuda.core): use __all__ in utils instead of per-import noqa
rparolin 782f6a9
chore(cuda.core): collapse nested if in Location.__post_init__ (SIM102)
rparolin 0789bf6
test(cuda.core): share one DummyUnifiedMemoryResource per batched test
rparolin e0c782a
test(cuda.core): query all buffers before closing in test_batched_sam…
rparolin 10de998
review(cuda.core): address PR #1775 feedback
rparolin ab9a3ab
test(cuda.core): split managed-memory ops tests into tests/memory/
rparolin a3f342f
test(cuda.core): fix options regex for AdviseOptions ("an" vs "a")
rparolin c2a9662
chore(cuda.core): drop unused utils import + trailing blank lines
rparolin bede674
feat(cuda.core): add ManagedBuffer subclass + Host location
rparolin f59af4e
chore(cuda.core): simplify ManagedBuffer per /simplify review
rparolin 5147a7d
ci: re-trigger CI (transient cuInit INVALID_DEVICE on l4 runner)
rparolin 2151e61
refactor(cuda.core): use libcpp.vector for batched-op C arrays (R14)
rparolin 5c6d054
fix(cuda.core): restore CUDA_ERROR_NOT_INITIALIZED auto-init in _quer…
rparolin 47d5609
refactor(cuda.core): make Host a plain class instead of a dataclass (R1)
rparolin a40bb81
feat(cuda.core)!: drop int location shorthand from managed-memory ops…
rparolin c43e81e
docs(cuda.core): add AccessedBySet to api_private.rst (R5)
rparolin 71e9daa
docs(cuda.core): note the legacy NUMA round-trip limitation on prefer…
rparolin df928a0
refactor(cuda.core): use collections.abc.Sequence for input checks (R…
rparolin f522916
refactor(cuda.core): narrow Buffer.from_handle to Buffer-only (R3)
rparolin 6204c57
refactor(cuda.core): single API surface per operation (R9, R10, R11)
rparolin 36012fd
refactor(cuda.core): build advise reverse-lookup eagerly at module lo…
rparolin 067fb15
refactor(cuda.core): factor shared body of _do_batch_{prefetch,discar…
rparolin a9cd713
test(cuda.core): reuse production _get_int_attr in managed-memory tes…
rparolin d75a7bd
feat(cuda.core): cu12 fallback for prefetch_batch (N3)
rparolin 0af5bd4
test(cuda.core): cover AccessedBySet read methods (N7)
rparolin b0d1a21
feat(cuda.core): cu13 NUMA round-trip for ManagedBuffer.preferred_loc…
rparolin 4c228eb
docs(cuda.core): replace stale utils autosummary entries
rparolin 5743e05
feat(cuda.core): make Host a singleton class
rparolin 7126324
refactor(cuda.core): rename AccessedBySet -> AccessedBySetProxy
rparolin 238cb14
fix(cuda.core): silence ruff lints on Host singleton
rparolin d0b6621
fix(cuda.core): reject bool as Host(numa_id=...)
rparolin d0f9c7e
fix(cuda.core): hoist managed-buffer check in _advise_one
rparolin 191f29d
fix(cuda.core): clarify CUDA 12 NUMA-host error message
rparolin bcc056b
fix(cuda.core): reject Host(numa_id=...) up-front on CUDA 12
rparolin 1b66367
fix(cuda.core): make ManagedBuffer.accessed_by setter atomic
rparolin 5efbe4e
style(cuda.core): apply ruff format
rparolin 5e2c051
Merge remote-tracking branch 'upstream/main' into rparolin/managed_me…
rparolin 8c35376
Skip NUMA-aware Host coerce tests on CUDA 12 builds
rparolin 29235b9
Merge remote-tracking branch 'upstream/main' into rparolin/managed_me…
rparolin File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,97 @@ | ||
| # SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| import threading | ||
| from typing import ClassVar | ||
|
|
||
|
|
||
| class Host: | ||
| """Host (CPU) location for managed-memory operations. | ||
|
|
||
| Use one of the three forms: | ||
|
|
||
| * ``Host()`` — generic host (any NUMA node). | ||
| * ``Host(numa_id=N)`` — specific NUMA node ``N``. | ||
| * ``Host.numa_current()`` — NUMA node of the calling thread. | ||
|
|
||
| ``Host`` is the symmetric counterpart of :class:`~cuda.core.Device` | ||
| for managed-memory `prefetch`, `advise`, and `discard_prefetch` | ||
| targets. Pass either a ``Device`` or a ``Host`` to those operations | ||
| and to ``ManagedBuffer.preferred_location`` / ``accessed_by``. | ||
|
|
||
| ``Host`` is a singleton class, mirroring :class:`~cuda.core.Device`: | ||
| constructor calls with the same arguments return the same instance, | ||
| so ``Host() is Host()`` and ``Host(numa_id=1) is Host(numa_id=1)``. | ||
| ``Host.numa_current()`` returns its own singleton, distinct from | ||
| ``Host()`` because it represents a thread-relative location rather | ||
| than a fixed one. | ||
| """ | ||
|
|
||
| __slots__ = ("__weakref__", "_is_numa_current", "_numa_id") | ||
|
|
||
| # Singleton cache keyed by (numa_id, is_numa_current). | ||
| _instances: ClassVar[dict[tuple[int | None, bool], Host]] = {} | ||
| _instances_lock: ClassVar[threading.Lock] = threading.Lock() | ||
|
|
||
| def __new__(cls, numa_id: int | None = None) -> Host: | ||
| if numa_id is not None and (isinstance(numa_id, bool) or not isinstance(numa_id, int) or numa_id < 0): | ||
| raise ValueError(f"numa_id must be a non-negative int, got {numa_id!r}") | ||
| return cls._get_or_create(numa_id, is_numa_current=False) | ||
|
|
||
| @classmethod | ||
| def _get_or_create(cls, numa_id: int | None, is_numa_current: bool) -> Host: | ||
| key = (numa_id, is_numa_current) | ||
| cache = cls._instances | ||
| inst = cache.get(key) | ||
| if inst is not None: | ||
| return inst | ||
| with cls._instances_lock: | ||
| inst = cache.get(key) | ||
| if inst is None: | ||
| inst = object.__new__(cls) | ||
| object.__setattr__(inst, "_numa_id", numa_id) | ||
| object.__setattr__(inst, "_is_numa_current", is_numa_current) | ||
| cache[key] = inst | ||
| return inst | ||
|
|
||
| @property | ||
| def numa_id(self) -> int | None: | ||
| return self._numa_id | ||
|
|
||
| @property | ||
| def is_numa_current(self) -> bool: | ||
| return self._is_numa_current | ||
|
|
||
| @classmethod | ||
| def numa_current(cls) -> Host: | ||
| """Construct a ``Host`` referring to the calling thread's NUMA node.""" | ||
| return cls._get_or_create(None, is_numa_current=True) | ||
|
|
||
| def __setattr__(self, name: str, value) -> None: | ||
| raise AttributeError(f"{type(self).__name__} is immutable; cannot set {name!r}") | ||
|
|
||
| def __eq__(self, other: object) -> bool: | ||
| if not isinstance(other, Host): | ||
| return NotImplemented | ||
| return self is other | ||
|
|
||
| def __hash__(self) -> int: | ||
| return hash((Host, self._numa_id, self._is_numa_current)) | ||
|
|
||
| def __reduce__(self): | ||
| if self._is_numa_current: | ||
| return (_reconstruct_numa_current, ()) | ||
| return (Host, (self._numa_id,)) | ||
|
|
||
| def __repr__(self) -> str: | ||
| if self.is_numa_current: | ||
| return "Host.numa_current()" | ||
| if self.numa_id is None: | ||
| return "Host()" | ||
| return f"Host(numa_id={self.numa_id})" | ||
|
Comment on lines
+89
to
+93
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Q: Shouldn't we simply always specify both numa id and whether it is current? Maybe I miss something |
||
|
|
||
|
|
||
| def _reconstruct_numa_current() -> Host: | ||
| return Host.numa_current() | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we already have
__slots__, the language ensures that setting an attribute is not possible