Sync Python-side quantized_softmax schema with C++ kernel (add mask_type and pos args) (#18495) by mvartani-meta · Pull Request #18495 · pytorch/executorch

mvartani-meta · 2026-03-25T15:44:07Z

Summary:

D88997196 added mask_type (int) and pos (Tensor) parameters to the C++ cadence::quantized_softmax kernels and custom_ops.yaml, but missed updating the Python-side op registrations, reference implementations, quantizer fusion pass, and tests. This caused an argument count mismatch at runtime (Expected 11 args received 9) when running quantized softmax on the Xtensa ISS.

This diff completes the schema sync by updating:

ops_registrations.py — Updated all 4 lib.define() schemas and both register_fake meta functions to include int mask_type, Tensor pos after dim.
ref_implementations.py — Added mask_type and pos params to quantized_softmax_per_tensor_common, quantized_softmax_per_tensor, and quantized_softmax. Added assert mask_type == 0 guard consistent with existing assert mask is None.
quantizer/fusion_pass.py — Updated get_args_and_kwargs_softmax to emit mask_type=0 (no masking) and a dummy pos tensor (full([1], 0, dtype=int64)), matching the default behavior for standard softmax quantization.
tests/test_ref_implementations.py — Updated test_quantized_softmax_per_tensor and test_quantized_softmax call sites with the new args.

Reviewed By: hsharma35

Differential Revision: D98145095

pytorch-bot · 2026-03-25T15:44:11Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18495

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 2 Unrelated Failures

As of commit 4127576 with merge base 59838fc ():

NEW FAILURES - The following jobs have failed:

pull / unittest / linux / linux-job (gh)
examples/models/test/test_export.py::ExportTest::test_ic3_export_to_executorch
pull / unittest-editable / macos / macos-job (gh)
export/tests/test_target_recipes.py::TestTargetRecipes::test_linear_model

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / unittest-editable / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-codesync · 2026-03-25T15:44:15Z

@mvartani-meta has exported this pull request. If you are a Meta employee, you can view the originating Diff in D98145095.

github-actions · 2026-03-25T15:50:13Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

…ype and pos args) (pytorch#18495) Summary: D88997196 added mask_type (int) and pos (Tensor) parameters to the C++ cadence::quantized_softmax kernels and custom_ops.yaml, but missed updating the Python-side op registrations, reference implementations, quantizer fusion pass, and tests. This caused an argument count mismatch at runtime (Expected 11 args received 9) when running quantized softmax on the Xtensa ISS. This diff completes the schema sync by updating: - ops_registrations.py — Updated all 4 lib.define() schemas and both register_fake meta functions to include int mask_type, Tensor pos after dim. - ref_implementations.py — Added mask_type and pos params to quantized_softmax_per_tensor_common, quantized_softmax_per_tensor, and quantized_softmax. Added assert mask_type == 0 guard consistent with existing assert mask is None. - quantizer/fusion_pass.py — Updated get_args_and_kwargs_softmax to emit mask_type=0 (no masking) and a dummy pos tensor (full([1], 0, dtype=int64)), matching the default behavior for standard softmax quantization. - tests/test_ref_implementations.py — Updated test_quantized_softmax_per_tensor and test_quantized_softmax call sites with the new args. Differential Revision: D98145095

…ype and pos args) (pytorch#18495) Summary: Pull Request resolved: pytorch#18495 D88997196 added mask_type (int) and pos (Tensor) parameters to the C++ cadence::quantized_softmax kernels and custom_ops.yaml, but missed updating the Python-side op registrations, reference implementations, quantizer fusion pass, and tests. This caused an argument count mismatch at runtime (Expected 11 args received 9) when running quantized softmax on the Xtensa ISS. This diff completes the schema sync by updating: - ops_registrations.py — Updated all 4 lib.define() schemas and both register_fake meta functions to include int mask_type, Tensor pos after dim. - ref_implementations.py — Added mask_type and pos params to quantized_softmax_per_tensor_common, quantized_softmax_per_tensor, and quantized_softmax. Added assert mask_type == 0 guard consistent with existing assert mask is None. - quantizer/fusion_pass.py — Updated get_args_and_kwargs_softmax to emit mask_type=0 (no masking) and a dummy pos tensor (full([1], 0, dtype=int64)), matching the default behavior for standard softmax quantization. - tests/test_ref_implementations.py — Updated test_quantized_softmax_per_tensor and test_quantized_softmax call sites with the new args. Reviewed By: hsharma35 Differential Revision: D98145095

…ype and pos args) (pytorch#18495) Summary: D88997196 added mask_type (int) and pos (Tensor) parameters to the C++ cadence::quantized_softmax kernels and custom_ops.yaml, but missed updating the Python-side op registrations, reference implementations, quantizer fusion pass, and tests. This caused an argument count mismatch at runtime (Expected 11 args received 9) when running quantized softmax on the Xtensa ISS. This diff completes the schema sync by updating: - ops_registrations.py — Updated all 4 lib.define() schemas and both register_fake meta functions to include int mask_type, Tensor pos after dim. - ref_implementations.py — Added mask_type and pos params to quantized_softmax_per_tensor_common, quantized_softmax_per_tensor, and quantized_softmax. Added assert mask_type == 0 guard consistent with existing assert mask is None. - quantizer/fusion_pass.py — Updated get_args_and_kwargs_softmax to emit mask_type=0 (no masking) and a dummy pos tensor (full([1], 0, dtype=int64)), matching the default behavior for standard softmax quantization. - tests/test_ref_implementations.py — Updated test_quantized_softmax_per_tensor and test_quantized_softmax call sites with the new args. Reviewed By: hsharma35 Differential Revision: D98145095

…ype and pos args) (pytorch#18495) Summary: Pull Request resolved: pytorch#18495 D88997196 added mask_type (int) and pos (Tensor) parameters to the C++ cadence::quantized_softmax kernels and custom_ops.yaml, but missed updating the Python-side op registrations, reference implementations, quantizer fusion pass, and tests. This caused an argument count mismatch at runtime (Expected 11 args received 9) when running quantized softmax on the Xtensa ISS. This diff completes the schema sync by updating: - ops_registrations.py — Updated all 4 lib.define() schemas and both register_fake meta functions to include int mask_type, Tensor pos after dim. - ref_implementations.py — Added mask_type and pos params to quantized_softmax_per_tensor_common, quantized_softmax_per_tensor, and quantized_softmax. Added assert mask_type == 0 guard consistent with existing assert mask is None. - quantizer/fusion_pass.py — Updated get_args_and_kwargs_softmax to emit mask_type=0 (no masking) and a dummy pos tensor (full([1], 0, dtype=int64)), matching the default behavior for standard softmax quantization. - tests/test_ref_implementations.py — Updated test_quantized_softmax_per_tensor and test_quantized_softmax call sites with the new args. Reviewed By: hsharma35 Differential Revision: D98145095

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 25, 2026

meta-codesync bot added fb-exported meta-exported labels Mar 25, 2026

meta-codesync bot changed the title ~~Sync Python-side quantized_softmax schema with C++ kernel (add mask_type and pos args)~~ Sync Python-side quantized_softmax schema with C++ kernel (add mask_type and pos args) (#18495) Mar 25, 2026

mvartani-meta force-pushed the export-D98145095 branch from 97477f0 to ac00fe2 Compare March 25, 2026 22:41

hsharma35 approved these changes Mar 25, 2026

View reviewed changes

mvartani-meta force-pushed the export-D98145095 branch 2 times, most recently from 4509b49 to 3e3a1ed Compare March 26, 2026 14:33

mvartani-meta force-pushed the export-D98145095 branch from 3e3a1ed to 2cc1f22 Compare March 26, 2026 14:37

mvartani-meta force-pushed the export-D98145095 branch from 2cc1f22 to c7c9d70 Compare March 26, 2026 14:42

mvartani-meta force-pushed the export-D98145095 branch from c7c9d70 to 4127576 Compare March 26, 2026 14:50

meta-codesync bot merged commit d31d4be into pytorch:main Mar 27, 2026
158 of 163 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync Python-side quantized_softmax schema with C++ kernel (add mask_type and pos args) (#18495)#18495

Sync Python-side quantized_softmax schema with C++ kernel (add mask_type and pos args) (#18495)#18495
meta-codesync[bot] merged 1 commit intopytorch:mainfrom
mvartani-meta:export-D98145095

mvartani-meta commented Mar 25, 2026 •

edited by meta-codesync bot

Loading

Uh oh!

pytorch-bot bot commented Mar 25, 2026 •

edited

Loading

Uh oh!

meta-codesync bot commented Mar 25, 2026

Uh oh!

github-actions bot commented Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mvartani-meta commented Mar 25, 2026 • edited by meta-codesync bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18495

❌ 2 New Failures, 2 Unrelated Failures

Uh oh!

meta-codesync bot commented Mar 25, 2026

Uh oh!

github-actions bot commented Mar 25, 2026

This PR needs a release notes: label

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mvartani-meta commented Mar 25, 2026 •

edited by meta-codesync bot

Loading

pytorch-bot bot commented Mar 25, 2026 •

edited

Loading

This PR needs a `release notes:` label