[FEAT] Module::LoadFromBytes: dispatching entry point for in-memory module loaders#591
[FEAT] Module::LoadFromBytes: dispatching entry point for in-memory module loaders#591lucifer1004 wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces the ability to load TVM modules from in-memory bytes by adding LoadFromBytes to the C++ API and load_from_bytes to the Rust bindings. It also includes hygiene improvements to Rust macros, such as using $crate for internal references. Feedback suggests replacing the #[unsafe(no_mangle)] attribute with the standard #[no_mangle] to maintain compatibility with Rust versions older than 1.82.0.
| // than a bare `tvm_ffi_sys::…`) lets downstream crates use the | ||
| // macro without having to add `tvm-ffi-sys` to their own | ||
| // `[dependencies]`. | ||
| #[unsafe(no_mangle)] |
There was a problem hiding this comment.
The #[unsafe(no_mangle)] attribute is a feature stabilized in Rust 1.82.0. Using this syntax will cause compilation errors on older versions of the Rust compiler (e.g., 1.80.0 or 1.81.0). Unless the project has explicitly bumped its Minimum Supported Rust Version (MSRV) to 1.82.0, it is recommended to use the standard #[no_mangle] attribute, which is backward-compatible and still valid in current Rust versions.
| #[unsafe(no_mangle)] | |
| #[no_mangle] |
Three small bugs in the Rust ergonomics that prevented the macros from being usable from downstream cdylibs: 1. `ensure!` expanded to `crate::bail!`, which resolves to the *caller* crate at expansion site rather than `tvm_ffi`. Switched to `$crate::bail!` so the path resolves correctly in any crate. 2. `tvm_ffi_dll_export_typed_func!` referenced `tvm_ffi_sys::TVMFFIAny` without a `$crate::` prefix, forcing every downstream crate to add `tvm-ffi-sys` to its own `[dependencies]`. Switched to `$crate::tvm_ffi_sys::TVMFFIAny`; downstream now only needs `tvm-ffi`. 3. The generated `pub unsafe extern "C" fn __tvm_ffi_<name>` had no `#[no_mangle]`, so the linker stripped the symbol from cdylibs and `Module::GetFunction` could not find it. Added `#[unsafe(no_mangle)]` (supported in 2021 + 2024 editions on rustc >= 1.82). Verified by building a downstream cdylib that only depends on `tvm-ffi` (no `tvm-ffi-sys` direct dep), loading it via `tvm_ffi.load_module(...)`, and calling exported scalar + Tensor functions from Python.
64bc197 to
4995972
Compare
4995972 to
70ef19b
Compare
…odule loaders
The internal helper `LoadModuleFromBytes(kind, bytes)`
(`src/ffi/extra/library_module.cc`) has been around for a while as a
C++ free function used during binary deserialization. It was not
exposed as a public Module API, so callers who already hold module
payload in memory (e.g. a PTX or CUBIN blob fetched from a registry)
had to materialize it to disk first and go through
`ModuleLoadFromFile`. This commit promotes the helper to a public
`Module::LoadFromBytes(kind, bytes)` and registers it as the
`ffi.ModuleLoadFromBytes` global so Python and Rust bindings can call
it without re-implementing kind → loader dispatch.
## API contract
`Module::LoadFromBytes(kind, bytes)` dispatches to the registered
global `ffi.Module.load_from_bytes.<kind>` (signature
`(Bytes) -> Module`). If no loader for the given kind is registered,
`RuntimeError` is raised naming the missing key — so the user knows
exactly what to register.
This is the *dispatching entry point*, not a specific format loader.
Loaders are registered by consumers, mirroring how loaders for module
formats already work today (the cubin_launcher example header-only
library is the canonical CUDA loader template). This split keeps
`libtvm_ffi.so` independent of libcuda / ROCm / etc.: a CPU-only
build of tvm-ffi has the API but no built-in CUDA loader, whereas a
consumer-side `.so` (built against `cubin_launcher.h`) can register
`ffi.Module.load_from_bytes.cubin` for everyone in the same process.
## Changes
* `include/tvm/ffi/extra/module.h`: declares
`Module::LoadFromBytes(const String& kind, const Bytes& bytes)` with
a doc note pointing at `cubin_launcher` as the canonical loader
template.
* `src/ffi/extra/library_module.cc`: defines it as a thin wrapper
around the existing `LoadModuleFromBytes`.
* `src/ffi/extra/module.cc`: registers `ffi.ModuleLoadFromBytes` in
the static-init block alongside `ffi.ModuleLoadFromFile`.
* `rust/tvm-ffi/src/extra/module.rs`: adds
`tvm_ffi::Module::load_from_bytes(kind, bytes)`.
* `python/tvm_ffi/module.py`: adds `load_module_from_bytes(kind,
data)` mirroring the existing `load_module(path)`. Exposed from
`tvm_ffi/__init__.py`.
* `python/tvm_ffi/_ffi_api.py`: regenerated stub.
* `tests/python/test_module_load_from_bytes.py`: three end-to-end
tests covering (1) round-trip via a Python-registered loader,
(2) error path when no loader is registered, (3) loader exceptions
propagating to the caller.
## Motivation
Use case: a project that fetches CUDA PTX / CUBIN payloads from a
registry already has the bytes in memory. The current
`Module::LoadFromFile` path forces a tempfile detour. With this API:
```python
import tvm_ffi
@tvm_ffi.register_global_func("ffi.Module.load_from_bytes.echo")
def _echo_loader(payload: bytes) -> tvm_ffi.Module:
# Real loader would parse `payload` and return a runnable module.
...
mod = tvm_ffi.load_module_from_bytes("echo", b"<payload>")
```
The dispatch through `ffi.Module.load_from_bytes.<kind>` is unchanged;
existing loaders register exactly as before.
70ef19b to
89eb576
Compare
|
Thanks for contribution. this is a case where we do not want to expose to public, mainly because the module loading would needs to be triggered with submodules, and this is supposed to be triggered by the formal whole module loader like dso. For same library load, one mechanism we recommend is the system library approach. Global function should be sufficient for UT perhaps |
|
For specific modules like cuda or ptx, generally we will expose different global ffi function constructors and not going through the module serializer api |
Summary
Promotes the existing internal
LoadModuleFromBytes(kind, bytes)helper (
src/ffi/extra/library_module.cc) to a publicModuleAPI,registers it as the
ffi.ModuleLoadFromBytesglobal, adds matchingPython (
tvm_ffi.load_module_from_bytes) and Rust(
tvm_ffi::Module::load_from_bytes) bindings, and ships end-to-endPython tests that exercise the dispatch contract.
API contract
Module::LoadFromBytes(kind, bytes)dispatches to the registeredglobal
ffi.Module.load_from_bytes.<kind>(signature(Bytes) -> Module). If no loader for the given kind is registered,RuntimeErroris raised naming the missing key — so the user knowsexactly what they need to register.
This is the dispatching entry point, not a specific format loader.
Loaders are registered by consumers. The split keeps
libtvm_ffi.soindependent of libcuda / ROCm / etc.: a CPU-only build has the API
but no built-in CUDA loader, whereas a consumer-side
.so(builtagainst the existing header-only
tvm/ffi/extra/cuda/cubin_launcher.h)can register
ffi.Module.load_from_bytes.cubinfor the whole process.The
examples/cubin_launcher/dynamic_cubin/example alreadyimplements this pattern.
Changes
include/tvm/ffi/extra/module.h— declaresModule::LoadFromBytes(const String& kind, const Bytes& bytes)witha doc note pointing at
cubin_launcheras the canonical loadertemplate.
src/ffi/extra/library_module.cc— defines it as a thin wrapperaround the existing
LoadModuleFromBytes.src/ffi/extra/module.cc— registersffi.ModuleLoadFromBytesinthe static-init block alongside
ffi.ModuleLoadFromFile.rust/tvm-ffi/src/extra/module.rs— addstvm_ffi::Module::load_from_bytes(kind, bytes).python/tvm_ffi/module.py— addsload_module_from_bytes(kind, data)mirroring the existingload_module(path). Exposed fromtvm_ffi/__init__.py.python/tvm_ffi/_ffi_api.py— regenerated stub.tests/python/test_module_load_from_bytes.py— three end-to-endtests:
RuntimeErrorpath when no loader is registered (messagenames the missing key),
Motivation
A project that fetches CUDA PTX / CUBIN payloads from a registry
already has the bytes in memory. The current
Module::LoadFromFilepath forces a tempfile detour. With this API:
Or from Rust:
The dispatch through
ffi.Module.load_from_bytes.<kind>is unchanged;existing loaders register exactly as before.
Test plan
(
tests/python/test_module_load_from_bytes.py).ffi.ModuleLoadFromFileglobal continues to work.Module::load_from_bytescallable.revision; updated revision should match.
Stacked on #590
This branch is logically stacked on #590 (Rust macro fixes). After
#590 merges, this PR's diff collapses to just the
Module::LoadFromBytescommit. Reviewers who have already approved#590 can skip the first commit here.