[ExecuTorch][WebGPU] Dynamic resize hooks for sigmoid and select_copy by JulianCloudNTH · Pull Request #20578 · pytorch/executorch

JulianCloudNTH · 2026-06-28T16:22:28Z

Stack from ghstack (oldest at bottom):

Make sigmoid and select_copy serve any live shape from one graph; fix select's last-token index under dynamic shapes.

Problem: Both ops baked their dispatch/params/output shape at build() for the max shape. select_copy was worse: a negative index (e.g. -1 for the last token) was normalized against the build-time MAX dim, so at a smaller live S it selected a stale/zero position past the live data — producing wrong (often zero) output.

Solution:

sigmoid (generic add_unary_op): a resize hook recomputes num_elements/dispatch and sets the output cur_dims (shape-preserving).
select_copy: KEEP the raw (possibly negative) index at build; a resize hook re-resolves it against the LIVE dim, recomputes the output dims (= input minus dim), rebuilds the out/in TensorMeta UBOs and the dispatch.
Both keep their uniform buffer(s) alive via own_uniform_buffer.

Implementation:

The select out/in meta is rebuilt from synthetic WebGPUTensor{dims} via fill_tensor_meta (reads only .dims).
Mirrors Vulkan per-op resize_*_node.

Constraints: Behavior-neutral on static graphs (hooks fire only when an input's live shape differs from the max). No kernel/WGSL/numerics change.

Co-authored-with: Claude Code.
@exported-using-ghexport

Differential Revision: D109906095

[ghstack-poisoned]

pytorch-bot · 2026-06-28T16:22:32Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20578

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 46fcce5 with merge base db5f1ba ():

NEW FAILURE - The following job has failed:

pull / android / run-emulator (gh)
The process '/usr/bin/sh' failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-06-28T16:23:09Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

JulianCloudNTH · 2026-06-29T20:22:13Z

@claude review and check for any areas or opportunities for modularization

claude · 2026-06-29T20:22:54Z

Claude finished @JulianCloudNTH's task in 2m 47s —— View job

Review: Dynamic resize hooks for sigmoid and select_copy

Read changed files & surrounding WebGPU runtime infrastructure
Analyze correctness of resize hooks
Identify modularization opportunities

The change is well-scoped and matches the established per-op resize-hook pattern (add, mul). The select_copy negative-index fix is correct and is a real bug fix: at build the index was normalized against the max dim, so -1 pointed past live data at smaller S; keeping raw_index and re-resolving against cur_dims(in_id)[dim] in the hook is the right fix. Both ops correctly switch to own_uniform_buffer so the hook can rewrite the UBOs.

Correctness — looks good, two minor notes

Select.cpp — guards not re-checked on resize (acceptable). The build path validates nbytes == numel*4 (fp32) and the bind-group buffer size is pinned to the max nbytes. The hook only rewrites meta/params/workgroup count, never buffer sizes — correct, since dtype and allocation are static and only live sizes shrink. Worth a one-line comment in the hook noting the bind-group sizes intentionally stay at max so it doesn't read as an omission.
UnaryOp.cpp — build vs. hook compute numel from different tensors. Build uses out_tensor.nbytes/sizeof(float); the hook uses numel_of(cur_dims(in_id)). They're guaranteed equal (the op asserts in.nbytes == out.nbytes), so this is fine, but reading in in one place and out in the other is a small inconsistency. Computing both from the input would read more uniformly.

No issues with the lambda captures (all captured values are used; handles copied before own_uniform_buffer transfers ownership).

Modularization — the more substantive feedback

This PR is behavior-correct, but it lands on top of a growing amount of near-duplicate boilerplate across ops/. Three concrete, increasingly valuable extractions:

1. A meta-UBO rewrite helper (highest value, lowest risk). The "synthesize a WebGPUTensor from dims → fill_tensor_meta → wgpuQueueWriteBuffer" sequence now appears in select (Select.cpp:194-201) and mul (mul/BinaryOp.cpp:187-197), and is the kind of thing every future shape-changing op will copy. A small helper in TensorMeta.h would absorb it:

inline void write_meta_from_dims(WebGPUGraph& g, WGPUBuffer buf,
                                 const std::vector<int64_t>& dims) {
  WebGPUTensor t; t.dims = dims;
  TensorMeta m; fill_tensor_meta(t, &m);
  wgpuQueueWriteBuffer(g.queue(), buf, 0, &m, sizeof(m));
}

Select's hook then drops to write_meta_from_dims(g, out_meta_buf, od); write_meta_from_dims(g, in_meta_buf, ind);. (A _broadcast overload covers mul.)

2. A set_workgroup_1d helper on the resize path. Every resize hook ends with the same g.dispatch_at(idx).workgroup_count_x = utils::compute_1d_workgroup_count(g.device(), numel, wg_size, name); (Select.cpp:206, UnaryOp.cpp:152, add/mul). A thin utils::set_dispatch_1d(g, dispatch_idx, numel, wg_size, name) removes the repeated dispatch_at(...).workgroup_count_x = plumbing and keeps the device/queue access in one place.

3. The 5-binding gather scaffold shared by slice and select. Select.cpp and Slice.cpp are ~110 lines of identical bind-group-layout / pipeline-layout / pipeline / bind-group setup for the canonical [in, out(rw), out_meta, in_meta, params] kernel — only the *Params struct and WGSL differ. This is the biggest duplication and the one most likely to drift. Consider a builder that returns the created objects:

struct GatherPipeline { WGPUComputePipeline pipeline; WGPUBindGroup bind_group;
                        WGPUBuffer out_meta_buf, in_meta_buf, params_buf; };
GatherPipeline build_gather_op(graph, in_tensor, out_tensor, wgsl,
                               wg_size, const void* params, size_t params_size);

The get_or_create_bgl cache already exists in WebGPUGraph, so the layout could also be keyed/shared rather than rebuilt per op. This one is a larger refactor and out of scope for this PR — but it's the place where per-op boilerplate is accumulating fastest, and select/slice are the natural first consolidation. Start the refactor →

Given CLAUDE.md's "no trivial single-use helpers" rule, #1 and #2 clear that bar because they're now used by 2+ ops and will be reused by every shape-changing op; #3 is a real refactor worth a follow-up PR rather than expanding this one.

Recommendation

Ship the fix as-is (correct and minimal). Suggest doing #1 either here or as the immediate next stacked diff while the pattern is fresh, and tracking #3 as a follow-up.
· branch gh/JulianCloudNTH/70/head

[ghstack-poisoned]

Update

3d8d933

[ghstack-poisoned]

JulianCloudNTH temporarily deployed to cadence June 28, 2026 16:22 — with GitHub Actions Inactive

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 28, 2026

Update

1e00c72

[ghstack-poisoned]

JulianCloudNTH temporarily deployed to cadence June 29, 2026 22:10 — with GitHub Actions Inactive

meta-codesync Bot added the meta-exported label Jun 29, 2026

Update

e2a1c90

[ghstack-poisoned]

JulianCloudNTH temporarily deployed to cadence June 30, 2026 02:46 — with GitHub Actions Inactive

Update

46fcce5

[ghstack-poisoned]

JulianCloudNTH temporarily deployed to cadence June 30, 2026 21:10 — with GitHub Actions Inactive

JulianCloudNTH mentioned this pull request Jun 30, 2026

[ExecuTorch][WebGPU] 2D-fold mul + permute dispatch (lift 65535 1D cap) #20651

Open

JulianCloudNTH mentioned this pull request Jun 30, 2026

[ExecuTorch][WebGPU] Use requiredFeatures instance API on native + emscripten Dawn #20652

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ExecuTorch][WebGPU] Dynamic resize hooks for sigmoid and select_copy#20578

[ExecuTorch][WebGPU] Dynamic resize hooks for sigmoid and select_copy#20578
JulianCloudNTH wants to merge 4 commits into
gh/JulianCloudNTH/70/basefrom
gh/JulianCloudNTH/70/head

JulianCloudNTH commented Jun 28, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Jun 28, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 28, 2026

Uh oh!

JulianCloudNTH commented Jun 29, 2026

Uh oh!

claude Bot commented Jun 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

JulianCloudNTH commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20578

❌ 1 New Failure

Uh oh!

github-actions Bot commented Jun 28, 2026

This PR needs a release notes: label

Uh oh!

JulianCloudNTH commented Jun 29, 2026

Uh oh!

claude Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review: Dynamic resize hooks for sigmoid and select_copy

Correctness — looks good, two minor notes

Modularization — the more substantive feedback

Recommendation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JulianCloudNTH commented Jun 28, 2026 •

edited

Loading

pytorch-bot Bot commented Jun 28, 2026 •

edited

Loading

This PR needs a `release notes:` label

claude Bot commented Jun 29, 2026 •

edited

Loading