-
Notifications
You must be signed in to change notification settings - Fork 59
Arena-backed tensor-of-tensors: 8-byte view inner cells through einsum/contraction #548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
26 commits
Select commit
Hold shift + click to select a range
1c757e3
arena: add allocator + plan helper + tests
zhihao-deng 52fdeaa
arena_kernels: add ToT kernels + tests
zhihao-deng 7e6b58a
arena_einsum: regime-A (outer-Hadamard) plans + dispatch + tests
zhihao-deng 582937a
tensor: route ToT trivial ops through arena kernels + tests
zhihao-deng 463aa6e
cont_engine: thread arena plan + zero-overhead sizeof gate
zhihao-deng d9e6a59
einsum + tests/cases: hook regime-A arena into einsum + add hec_* cas…
zhihao-deng ad1a8c6
review fixes: portable sizeof gate, explicit plan-move, alignment intent
zhihao-deng 6525c36
ArenaTensor parity: Tensor<ArenaTensor> behaves like Tensor<Tensor>
evaleev 310a62b
Add axpy_to CPO; thread it into einsum/cont_engine scale paths
evaleev e7222eb
arena ToT: unified construction + arena-aware fill/set/init_elements
evaleev 40a90bd
arena ToT: arena-aware add/subt/scale/neg tile ops + expression tests
evaleev c1a2172
cont_engine: route ToT x ToT Hadamard with view inner cells via outer…
evaleev 9cba450
tot tests: add end-to-end ToT einsum contraction harness
evaleev bd050b3
einsum: guard legacy ToT element-op path for view inner cells
evaleev 42ae19a
arena_kernels: add arena_inner_permute slab-rewrite kernel
evaleev 36924b6
tensor: Tensor<ArenaTensor>::permute handles bipartite permutations
evaleev 03f502a
arena_einsum: handle permuted inner contractions via slab-level hoist
evaleev 3a1a545
arena_einsum: hoist permuted inner-Hadamard operands to C-layout
evaleev 9c392da
tensor: arena-aware t x tot Hadamard mult with a result permutation
evaleev 2d93fbf
tensor: arena-aware tot x t mult, ArenaTensor::sum, size_of(ArenaTensor)
evaleev 722918c
arena ToT: einsum/contraction-engine support for Tensor<ArenaTensor>
evaleev a940d0b
arena ToT: MultEngine Hadamard-outer x inner-contraction; fix Mult co…
evaleev 86ee236
arena ToT: support einsum's replicate_array path on >1 rank
evaleev cc9a386
arena_einsum: make ContractionArenaPlan nameable for non-ToT operands
evaleev bd7b4ce
tensor: permuting axpy_to initializes an empty target
evaleev 5db3289
arena_tensor: drop redundant member scalar operator*=
evaleev File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The gather-based retile is deliberate for arena ToT (see the comment opening this branch): the generic scatter path (
write_tile_block) would rebind the target's null inner cells onto the source tiles' arena slabs, leaving dangling views once the source array is destroyed — so the target rank must pull source tiles and deep-copy. Fetches are de-duplicated per rank viasrc_tile_cache, so the cost is O(distinct source tiles)/rank, not per target tile. Batching/prefetching those remote fetches is a worthwhile follow-up optimization but is orthogonal to correctness; noting it for later.