Skip to content

Sync Python-side quantized_softmax schema with C++ kernel (add mask_type and pos args) (#18495)#18495

Merged
meta-codesync[bot] merged 1 commit intopytorch:mainfrom
mvartani-meta:export-D98145095
Mar 27, 2026
Merged

Sync Python-side quantized_softmax schema with C++ kernel (add mask_type and pos args) (#18495)#18495
meta-codesync[bot] merged 1 commit intopytorch:mainfrom
mvartani-meta:export-D98145095

Conversation

@mvartani-meta
Copy link
Copy Markdown
Contributor

@mvartani-meta mvartani-meta commented Mar 25, 2026

Summary:

D88997196 added mask_type (int) and pos (Tensor) parameters to the C++ cadence::quantized_softmax kernels and custom_ops.yaml, but missed updating the Python-side op registrations, reference implementations, quantizer fusion pass, and tests. This caused an argument count mismatch at runtime (Expected 11 args received 9) when running quantized softmax on the Xtensa ISS.

This diff completes the schema sync by updating:

  • ops_registrations.py — Updated all 4 lib.define() schemas and both register_fake meta functions to include int mask_type, Tensor pos after dim.
  • ref_implementations.py — Added mask_type and pos params to quantized_softmax_per_tensor_common, quantized_softmax_per_tensor, and quantized_softmax. Added assert mask_type == 0 guard consistent with existing assert mask is None.
  • quantizer/fusion_pass.py — Updated get_args_and_kwargs_softmax to emit mask_type=0 (no masking) and a dummy pos tensor (full([1], 0, dtype=int64)), matching the default behavior for standard softmax quantization.
  • tests/test_ref_implementations.py — Updated test_quantized_softmax_per_tensor and test_quantized_softmax call sites with the new args.

Reviewed By: hsharma35

Differential Revision: D98145095

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Mar 25, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18495

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 2 Unrelated Failures

As of commit 4127576 with merge base 59838fc (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 25, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync bot commented Mar 25, 2026

@mvartani-meta has exported this pull request. If you are a Meta employee, you can view the originating Diff in D98145095.

@github-actions
Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@meta-codesync meta-codesync bot changed the title Sync Python-side quantized_softmax schema with C++ kernel (add mask_type and pos args) Sync Python-side quantized_softmax schema with C++ kernel (add mask_type and pos args) (#18495) Mar 25, 2026
mvartani-meta added a commit to mvartani-meta/executorch that referenced this pull request Mar 25, 2026
…ype and pos args) (pytorch#18495)

Summary:

D88997196 added mask_type (int) and pos (Tensor) parameters to the C++ cadence::quantized_softmax kernels and custom_ops.yaml, but missed updating the Python-side op registrations, reference implementations, quantizer fusion pass, and tests. This caused an argument count mismatch at runtime (Expected 11 args received 9) when running quantized softmax on the Xtensa ISS.

This diff completes the schema sync by updating:

- ops_registrations.py — Updated all 4 lib.define() schemas and both register_fake meta functions to include int mask_type, Tensor pos after dim.
- ref_implementations.py — Added mask_type and pos params to quantized_softmax_per_tensor_common, quantized_softmax_per_tensor, and quantized_softmax. Added assert mask_type == 0 guard consistent with existing assert mask is None.
- quantizer/fusion_pass.py — Updated get_args_and_kwargs_softmax to emit mask_type=0 (no masking) and a dummy pos tensor (full([1], 0, dtype=int64)), matching the default behavior for standard softmax quantization.
- tests/test_ref_implementations.py — Updated test_quantized_softmax_per_tensor and test_quantized_softmax call sites with the new args.

Differential Revision: D98145095
mvartani-meta added a commit to mvartani-meta/executorch that referenced this pull request Mar 25, 2026
…ype and pos args) (pytorch#18495)

Summary:
Pull Request resolved: pytorch#18495

D88997196 added mask_type (int) and pos (Tensor) parameters to the C++ cadence::quantized_softmax kernels and custom_ops.yaml, but missed updating the Python-side op registrations, reference implementations, quantizer fusion pass, and tests. This caused an argument count mismatch at runtime (Expected 11 args received 9) when running quantized softmax on the Xtensa ISS.

This diff completes the schema sync by updating:

- ops_registrations.py — Updated all 4 lib.define() schemas and both register_fake meta functions to include int mask_type, Tensor pos after dim.
- ref_implementations.py — Added mask_type and pos params to quantized_softmax_per_tensor_common, quantized_softmax_per_tensor, and quantized_softmax. Added assert mask_type == 0 guard consistent with existing assert mask is None.
- quantizer/fusion_pass.py — Updated get_args_and_kwargs_softmax to emit mask_type=0 (no masking) and a dummy pos tensor (full([1], 0, dtype=int64)), matching the default behavior for standard softmax quantization.
- tests/test_ref_implementations.py — Updated test_quantized_softmax_per_tensor and test_quantized_softmax call sites with the new args.

Reviewed By: hsharma35

Differential Revision: D98145095
@mvartani-meta mvartani-meta force-pushed the export-D98145095 branch 2 times, most recently from 4509b49 to 3e3a1ed Compare March 26, 2026 14:33
mvartani-meta added a commit to mvartani-meta/executorch that referenced this pull request Mar 26, 2026
…ype and pos args) (pytorch#18495)

Summary:

D88997196 added mask_type (int) and pos (Tensor) parameters to the C++ cadence::quantized_softmax kernels and custom_ops.yaml, but missed updating the Python-side op registrations, reference implementations, quantizer fusion pass, and tests. This caused an argument count mismatch at runtime (Expected 11 args received 9) when running quantized softmax on the Xtensa ISS.

This diff completes the schema sync by updating:

- ops_registrations.py — Updated all 4 lib.define() schemas and both register_fake meta functions to include int mask_type, Tensor pos after dim.
- ref_implementations.py — Added mask_type and pos params to quantized_softmax_per_tensor_common, quantized_softmax_per_tensor, and quantized_softmax. Added assert mask_type == 0 guard consistent with existing assert mask is None.
- quantizer/fusion_pass.py — Updated get_args_and_kwargs_softmax to emit mask_type=0 (no masking) and a dummy pos tensor (full([1], 0, dtype=int64)), matching the default behavior for standard softmax quantization.
- tests/test_ref_implementations.py — Updated test_quantized_softmax_per_tensor and test_quantized_softmax call sites with the new args.

Reviewed By: hsharma35

Differential Revision: D98145095
mvartani-meta added a commit to mvartani-meta/executorch that referenced this pull request Mar 26, 2026
…ype and pos args) (pytorch#18495)

Summary:

D88997196 added mask_type (int) and pos (Tensor) parameters to the C++ cadence::quantized_softmax kernels and custom_ops.yaml, but missed updating the Python-side op registrations, reference implementations, quantizer fusion pass, and tests. This caused an argument count mismatch at runtime (Expected 11 args received 9) when running quantized softmax on the Xtensa ISS.

This diff completes the schema sync by updating:

- ops_registrations.py — Updated all 4 lib.define() schemas and both register_fake meta functions to include int mask_type, Tensor pos after dim.
- ref_implementations.py — Added mask_type and pos params to quantized_softmax_per_tensor_common, quantized_softmax_per_tensor, and quantized_softmax. Added assert mask_type == 0 guard consistent with existing assert mask is None.
- quantizer/fusion_pass.py — Updated get_args_and_kwargs_softmax to emit mask_type=0 (no masking) and a dummy pos tensor (full([1], 0, dtype=int64)), matching the default behavior for standard softmax quantization.
- tests/test_ref_implementations.py — Updated test_quantized_softmax_per_tensor and test_quantized_softmax call sites with the new args.

Reviewed By: hsharma35

Differential Revision: D98145095
mvartani-meta added a commit to mvartani-meta/executorch that referenced this pull request Mar 26, 2026
…ype and pos args) (pytorch#18495)

Summary:
Pull Request resolved: pytorch#18495

D88997196 added mask_type (int) and pos (Tensor) parameters to the C++ cadence::quantized_softmax kernels and custom_ops.yaml, but missed updating the Python-side op registrations, reference implementations, quantizer fusion pass, and tests. This caused an argument count mismatch at runtime (Expected 11 args received 9) when running quantized softmax on the Xtensa ISS.

This diff completes the schema sync by updating:

- ops_registrations.py — Updated all 4 lib.define() schemas and both register_fake meta functions to include int mask_type, Tensor pos after dim.
- ref_implementations.py — Added mask_type and pos params to quantized_softmax_per_tensor_common, quantized_softmax_per_tensor, and quantized_softmax. Added assert mask_type == 0 guard consistent with existing assert mask is None.
- quantizer/fusion_pass.py — Updated get_args_and_kwargs_softmax to emit mask_type=0 (no masking) and a dummy pos tensor (full([1], 0, dtype=int64)), matching the default behavior for standard softmax quantization.
- tests/test_ref_implementations.py — Updated test_quantized_softmax_per_tensor and test_quantized_softmax call sites with the new args.

Reviewed By: hsharma35

Differential Revision: D98145095
…ype and pos args) (pytorch#18495)

Summary:
Pull Request resolved: pytorch#18495

D88997196 added mask_type (int) and pos (Tensor) parameters to the C++ cadence::quantized_softmax kernels and custom_ops.yaml, but missed updating the Python-side op registrations, reference implementations, quantizer fusion pass, and tests. This caused an argument count mismatch at runtime (Expected 11 args received 9) when running quantized softmax on the Xtensa ISS.

This diff completes the schema sync by updating:

- ops_registrations.py — Updated all 4 lib.define() schemas and both register_fake meta functions to include int mask_type, Tensor pos after dim.
- ref_implementations.py — Added mask_type and pos params to quantized_softmax_per_tensor_common, quantized_softmax_per_tensor, and quantized_softmax. Added assert mask_type == 0 guard consistent with existing assert mask is None.
- quantizer/fusion_pass.py — Updated get_args_and_kwargs_softmax to emit mask_type=0 (no masking) and a dummy pos tensor (full([1], 0, dtype=int64)), matching the default behavior for standard softmax quantization.
- tests/test_ref_implementations.py — Updated test_quantized_softmax_per_tensor and test_quantized_softmax call sites with the new args.

Reviewed By: hsharma35

Differential Revision: D98145095
@meta-codesync meta-codesync bot merged commit d31d4be into pytorch:main Mar 27, 2026
158 of 163 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants