Minor speedup for model lowering: Skip redundant run_decompositions when no ops match decomp table (#18496) by apullin · Pull Request #18496 · pytorch/executorch

apullin · 2026-03-25T16:03:40Z

Summary:

Adds an early-exit check to _gen_edge_manager_for_partitioners: before
calling program.run_decompositions(table), scan the graph for ops that
appear in the decomposition table. If none are found, skip the call
entirely.

Each run_decompositions call performs a full re-export of the program
via make_fx(), re-tracing every node through FakeTensor dispatch.
On the EDGE_DO_NOT_DECOMP path this function is called up to 3 times;
the early-exit eliminates at least one redundant call where the previous
pass already decomposed all matching ops.

The check recursively walks control flow submodules (cond/map/scan) to
avoid incorrectly skipping when decomposable ops are nested.

Benchmark

Model: small CNN feature extractor (~50K params, 9 conv layers with
LayerNorm, targeting Ethos-U55 via the ARM/TOSA lowering pipeline).
Graph: ~1200 nodes.

lower() before: 82 s
lower() after: 71 s
Delta: -11 s (-13 %)

Differential Revision: D96489903

pytorch-bot · 2026-03-25T16:03:45Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18496

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⚠️ 11 Awaiting Approval

As of commit 77d036d with merge base d050015 ():

AWAITING APPROVAL - The following workflows need approval before CI can run:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-codesync · 2026-03-25T16:03:48Z

@apullin has exported this pull request. If you are a Meta employee, you can view the originating Diff in D96489903.

github-actions · 2026-03-25T16:05:43Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

…orch#18496) Summary: Adds an early-exit check to _gen_edge_manager_for_partitioners: before calling program.run_decompositions(table), scan the graph for ops that appear in the decomposition table. If none are found, skip the call entirely. Each run_decompositions call performs a full re-export of the program via make_fx(), re-tracing every node through FakeTensor dispatch. On the EDGE_DO_NOT_DECOMP path this function is called up to 3 times; the early-exit eliminates at least one redundant call where the previous pass already decomposed all matching ops. The check recursively walks control flow submodules (cond/map/scan) to avoid incorrectly skipping when decomposable ops are nested. ## Benchmark Model: small CNN feature extractor (~50K params, 9 conv layers with LayerNorm, targeting Ethos-U55 via the ARM/TOSA lowering pipeline). Graph: ~1200 nodes. lower() before: 82 s lower() after: 71 s Delta: -11 s (-13 %) Differential Revision: D96489903

…ions (pytorch#18497) Summary: Pull Request resolved: pytorch#18497 Adds infrastructure for skipping and fast-copying unchanged nodes during ExportPass execution, then annotates ~60 ARM backend passes to use it. ## Changes ### 1. should_run() hook on ExportPass / ArmPass Subclasses that declare a `targeted_ops` class attribute (a set of op overloads) can be skipped entirely when the graph contains none of their target ops. ArmPass provides a default implementation via inheritance. ### 2. Fast-copy for cold nodes When a pass declares `targeted_ops`, nodes whose ops are NOT in the set are copied into the new graph via `graph.node_copy()` instead of full FakeTensor dispatch. Per-node cost drops from ~0.4 ms to ~0.02 ms (~20x). Includes a safety guard: nodes without `val` metadata (e.g. nodes inserted by `call()` overrides before `super().call()`) fall back to full dispatch instead of propagating None. ### 3. FakeTensor cache extension Context manager `_extend_faketensor_cache_builtins()` temporarily extends the FakeTensor dispatch cache to cover ExecuTorch op namespaces (quantized_decomposed, tosa, dim_order_ops, cortex_m). Avoids redundant re-dispatches for non-builtin ops across 50+ passes. ### 4. __init_subclass__ auto-discovery on ArmPass Subclasses with existing `_TARGET_OPS`, `_supported_ops`, or `_EDGE_OPS`/`_ATEN_OPS` attributes get `targeted_ops` populated automatically at class definition time — no manual annotation needed. ### 5. targeted_ops annotations on ~60 ARM passes Each annotation is a one-liner declaring the ops the pass checks in `call_operator()`. Combined with should_run() and fast-copy, this achieves the measured speedup below. ## Benchmark Model: small CNN feature extractor (~50K params, 9 conv layers with LayerNorm, targeting Ethos-U55 via the ARM/TOSA lowering pipeline). Graph: ~1200 nodes, 146 ExportPass invocations. lower() before: 186 s lower() after: 100 s Passes skipped: 53 of 146 Delta: -86 s (-46 %) Adds should_run() hook to ExportPass that subclasses can override to skip execution when a pass has no work to do. ArmPass implements a default that checks a targeted_ops class attribute against the graph's call_function nodes. Also adds: - _fast_copy_node path in ExportInterpreter.run_node that uses graph.node_copy instead of full FakeTensor dispatch for cold nodes in passes that declare targeted_ops. Per-node cost drops from ~0.4ms to ~0.02ms. - _extend_faketensor_cache_builtins context manager that extends FakeTensor dispatch cache to cover ExecuTorch ops (quantized_decomposed, tosa, etc.) - __init_subclass__ on ArmPass for auto-discovery of targeted_ops from existing _TARGET_OPS, _supported_ops, _EDGE_OPS/_ATEN_OPS attributes - targeted_ops annotations on ~60 ARM pass subclasses Measured on SleepNet featurizer (U55 lowering): lower(): 185s -> 96s = -89s (-48%) Differential Revision: D97528110

…orch#18496) Summary: Pull Request resolved: pytorch#18496 Adds an early-exit check to _gen_edge_manager_for_partitioners: before calling program.run_decompositions(table), scan the graph for ops that appear in the decomposition table. If none are found, skip the call entirely. Each run_decompositions call performs a full re-export of the program via make_fx(), re-tracing every node through FakeTensor dispatch. On the EDGE_DO_NOT_DECOMP path this function is called up to 3 times; the early-exit eliminates at least one redundant call where the previous pass already decomposed all matching ops. The check recursively walks control flow submodules (cond/map/scan) to avoid incorrectly skipping when decomposable ops are nested. ## Benchmark Model: small CNN feature extractor (~50K params, 9 conv layers with LayerNorm, targeting Ethos-U55 via the ARM/TOSA lowering pipeline). Graph: ~1200 nodes. lower() before: 82 s lower() after: 71 s Delta: -11 s (-13 %) Differential Revision: D96489903

apullin requested review from JacobSzwejbka and larryliu0820 as code owners March 25, 2026 16:03

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 25, 2026

meta-codesync bot added fb-exported meta-exported labels Mar 25, 2026

meta-codesync bot changed the title ~~Skip redundant run_decompositions when no ops match decomp table~~ Skip redundant run_decompositions when no ops match decomp table (#18496) Mar 25, 2026

apullin force-pushed the export-D96489903 branch from 931a377 to b1a74f8 Compare March 25, 2026 22:29

meta-codesync bot changed the title ~~Skip redundant run_decompositions when no ops match decomp table (#18496)~~ Skip redundant run_decompositions when no ops match decomp table Mar 26, 2026

apullin force-pushed the export-D96489903 branch from b1a74f8 to 4f1a818 Compare March 26, 2026 15:07

meta-codesync bot changed the title ~~Skip redundant run_decompositions when no ops match decomp table~~ Skip redundant run_decompositions when no ops match decomp table (#18496) Mar 30, 2026

apullin force-pushed the export-D96489903 branch from 4f1a818 to 93d0a70 Compare March 30, 2026 16:07

apullin force-pushed the export-D96489903 branch from 93d0a70 to 559036a Compare March 30, 2026 16:17

Andrew Pullin and others added 2 commits March 30, 2026 09:57

apullin force-pushed the export-D96489903 branch from 559036a to 77d036d Compare March 30, 2026 21:27

apullin requested a review from digantdesai as a code owner March 30, 2026 21:27

apullin changed the title ~~Skip redundant run_decompositions when no ops match decomp table (#18496)~~ Minor speedup for model lowering: Skip redundant run_decompositions when no ops match decomp table (#18496) Apr 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minor speedup for model lowering: Skip redundant run_decompositions when no ops match decomp table (#18496)#18496

Minor speedup for model lowering: Skip redundant run_decompositions when no ops match decomp table (#18496)#18496
apullin wants to merge 2 commits intopytorch:mainfrom
apullin:export-D96489903

apullin commented Mar 25, 2026 •

edited by meta-codesync bot

Loading

Uh oh!

pytorch-bot bot commented Mar 25, 2026 •

edited

Loading

Uh oh!

meta-codesync bot commented Mar 25, 2026

Uh oh!

github-actions bot commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

apullin commented Mar 25, 2026 • edited by meta-codesync bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark

Uh oh!

pytorch-bot bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18496

⚠️ 11 Awaiting Approval

Uh oh!

meta-codesync bot commented Mar 25, 2026

Uh oh!

github-actions bot commented Mar 25, 2026

This PR needs a release notes: label

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

apullin commented Mar 25, 2026 •

edited by meta-codesync bot

Loading

pytorch-bot bot commented Mar 25, 2026 •

edited

Loading

This PR needs a `release notes:` label