Skip to content

Commit 1cb05a4

Browse files
committed
docs(ecosystem): ztensor v0.15.0 — SliceElements and streaming GEMM
1 parent 3592553 commit 1cb05a4

1 file changed

Lines changed: 20 additions & 1 deletion

File tree

content/docs/ecosystem/ztensor.md

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ bookToc: true
66

77
# ztensor
88

9-
GPU-accelerated tensor, compute engine, and computation graph library for Go.
9+
GPU-accelerated tensor, compute engine, and computation graph library for Go. Current version: **v0.15.0**.
1010

1111
```bash
1212
go get github.com/zerfoo/ztensor
@@ -130,6 +130,25 @@ The `graph` package provides a computation graph compiler with operator fusion p
130130
| `internal/gpuapi/` | GPU Runtime Abstraction Layer (CUDA/ROCm/OpenCL) |
131131
| `internal/codegen/` | Megakernel code generator |
132132

133+
## What's New in v0.15.0
134+
135+
### MmapStorage.SliceElements
136+
137+
`MmapStorage.SliceElements` provides zero-copy slicing of mmap'd tensor elements. It returns a view into the memory-mapped region without copying data, making expert weight extraction in mixture-of-experts models efficient:
138+
139+
```go
140+
// Extract expert weights directly from the mmap'd file — no allocation
141+
expertWeights, err := mmapStorage.SliceElements(expertOffset, expertSize)
142+
```
143+
144+
This replaces the previous pattern of copying expert weights into a new tensor before each forward pass.
145+
146+
### Streaming GEMM for mmap'd Tensors
147+
148+
`internal/xblas` now includes a streaming GEMM path for mmap'd weight tensors. Instead of paging in the entire weight matrix before computation, the kernel tiles over the mmap region in cache-sized chunks, keeping memory bandwidth proportional to the active tile rather than the full matrix.
149+
150+
This enables over-RAM CPU inference: a model whose weights exceed physical RAM can run without GPU, with the OS paging tensor data from NVMe on demand. Combined with `MmapStorage.SliceElements`, a 229B MoE model runs on a 128 GB machine with no configuration flags.
151+
133152
## Dependencies
134153

135154
ztensor depends on [float16]({{< relref "numeric-types" >}}) and [float8]({{< relref "numeric-types" >}}) for half-precision and FP8 arithmetic.

0 commit comments

Comments
 (0)