Add Vision Transformer sample with attention visualization by lyonsno · Pull Request #569 · webgpu/webgpu-samples

lyonsno · 2026-06-23T23:53:40Z

Runs DeiT-Tiny (5.7M params) inference entirely in WebGPU compute shaders to classify images, and visualizes attention maps as interactive heatmap overlays showing which image patches the model focuses on.

The sample is organized around the transformer compute stages: patch embedding, layer normalization, multi-head attention (Q/K/V projections, scaled dot-product scores, softmax, weighted sum), MLP with GELU, and residual connections. Each compute shader is self-contained with at most 7 bindings per bind group.

Model weights are int8 quantized (5.8MB committed binary). The quantized weights are dequantized to fp32 during loading. An offline Python converter (tools/convert_deit_weights.py) generates the weight file from the HuggingFace model; it is not needed to run the sample. If maintainers prefer the weights hosted externally instead of committed, I can move them.

Third-party attribution is in sample/visionTransformer/THIRD_PARTY_NOTICES.md (model: Meta DeiT Apache-2.0, images: Unsplash).

Related to #350. This demonstrates transformer building blocks in WebGPU compute but does not specifically exercise DP4A, shader-f16, or subgroups; those would make good follow-up primitive-focused samples.

I'm happy to make any changes needed. Please let me know if the scope or asset size is a concern.

Runs DeiT-Tiny (5.7M params) inference entirely in WebGPU compute shaders to classify images, and visualizes attention maps as interactive heatmap overlays showing which image patches the model focuses on. The sample is organized around the transformer compute stages: patch embedding, layer normalization, multi-head attention, MLP with GELU, and residual connections. Each compute shader is self-contained with at most 7 bindings per bind group. Model weights are int8 quantized (5.8MB). Third-party attribution is in sample/visionTransformer/THIRD_PARTY_NOTICES.md. Related to webgpu#350. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

lyonsno force-pushed the vit-attention-visualization branch from 234350e to e1b2fd3 Compare June 24, 2026 00:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Vision Transformer sample with attention visualization#569

Add Vision Transformer sample with attention visualization#569
lyonsno wants to merge 1 commit into
webgpu:mainfrom
lyonsno:vit-attention-visualization

lyonsno commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lyonsno commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant