Skip to content

Cortex-M: build for any Cortex-M variant against Corstone-300#19520

Open
rascani wants to merge 1 commit into
pytorch:mainfrom
rascani:cortex-m-non-mve-corstone
Open

Cortex-M: build for any Cortex-M variant against Corstone-300#19520
rascani wants to merge 1 commit into
pytorch:mainfrom
rascani:cortex-m-non-mve-corstone

Conversation

@rascani
Copy link
Copy Markdown
Contributor

@rascani rascani commented May 12, 2026

Summary

Extend the Cortex-M test pipeline so the cortex-m<variant> target
strings registered in the AOT compile-config plumbing actually produce
runnable, ISA-faithful binaries. The binary is built end-to-end with
-mcpu=cortex-m<variant> — runner and core libraries alike — so
CMSIS-NN's compile-time __ARM_FEATURE_DSP / __ARM_FEATURE_MVE
selector exercises the matching kernel implementation. The Corstone-300
M55 simulator is an ISA superset of every earlier Cortex-M, so it
executes binaries compiled for older cores without modification — the
CI gate becomes "did the right CMSIS-NN code path execute correctly"
rather than "did per-CPU silicon behave as expected".

The build pipeline learns the target CPU end-to-end:

  • build_executorch.sh accepts --target_cpu and passes -DTARGET_CPU
    to the toolchain CMake.
  • build_test_runner.sh derives target_cpu from --target and
    forwards it. The regex matches both bare cortex-m<X> (the canonical
    form after the Phase 1 AOT API drop of +int8) and the legacy
    cortex-m<X>+int8 shape for any callers still on it.
  • build_executor_runner.sh derives the matching target_cpu and
    supplies a dummy ETHOSU_TARGET_NPU_CONFIG=ethos-u55-128 so
    core_platform's ethosu_get_architecture() parser stays happy.

One transient patch is layered into the externally-fetched
ethos-u/core_platform repo via the existing patch_repo mechanism:
an #if defined(__ARM_ARCH_8M_MAIN__) || defined(__ARM_ARCH_8_1M_MAIN__)
guard around the MPU init block in corstone-300/target.cpp. Without
it, the Armv8-M-only ARM_MPU_RBAR / ARM_MPU_RLAR API breaks the
build for older cores. The FVP doesn't enforce protection regions
without an explicit setup, so simulation correctness is unaffected.
The patch is a bridge — see TODO at corstone_utils.cmake:52
pending upstream merge of the equivalent guard.

Inside our own runner, the optional Armv8.1-M PMU intrinsics
(ARM_PMU_*) in arm_executor_runner.cpp and arm_perf_monitor.cpp
are guarded on __ARM_ARCH_8_1M_MAIN__. Earlier cores get a zero
cycle count rather than a compile error; functional correctness is
unaffected. run_fvp.sh routes all cortex-m* targets except
cortex-m85* to the Corstone-300 FVP.

Test Plan

Locally validated end-to-end on Corstone-300 with the qadd model:

  • cortex-m55 — baseline, PASS; op_quantize_per_tensor.cpp.obj
    contains MVE intrinsics (vdup.16, vmax.s16).
  • cortex-m4 — PASS; same object has no MVE — only single-precision
    FP (vmul.f32, vcvt.s32.f32). CMSIS-NN selects the DSP path (1275 DSP
    opcodes in libcmsis-nn.a).
  • cortex-m7 — PASS; same shape as M4.

Scalar-class variants (cortex-m{0,0plus,3,23}) still need a
follow-up: an Armv6-M HardFault_Handler guard in target.cpp and a
core_software/cmsis.cmake ARMCM0plus directory-case fix. The
target_cpu plumbing here already accommodates soft-float ABI builds —
the follow-up only adds those two additional __ARM_ARCH_* guards.

Authored with Claude.

cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented May 12, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19520

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 1 New Failure, 1 Cancelled Job

As of commit 8b8ce9e with merge base 8e8e957 (image):

NEW FAILURE - The following job has failed:

CANCELLED JOB - The following job was cancelled. Please retry:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 12, 2026
@github-actions github-actions Bot added ciflow/trunk module: arm Issues related to arm backend labels May 12, 2026
@github-actions
Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@mansnils
Copy link
Copy Markdown
Collaborator

LGTM. I like the direction this draft PR is going..

Extend the Cortex-M test pipeline so the `cortex-m<variant>` target
strings registered in the AOT compile-config plumbing actually produce
runnable, ISA-faithful binaries. The binary is built end-to-end with
`-mcpu=cortex-m<variant>` — runner and core libraries alike — so
CMSIS-NN's compile-time `__ARM_FEATURE_DSP` / `__ARM_FEATURE_MVE`
selector exercises the matching kernel implementation. The Corstone-300
M55 simulator is an ISA superset of every earlier Cortex-M, so it
executes binaries compiled for older cores without modification — the
CI gate becomes "did the right CMSIS-NN code path execute correctly"
rather than "did per-CPU silicon behave as expected".

The build pipeline learns the target CPU end-to-end:

* `build_executorch.sh` accepts `--target_cpu` and passes `-DTARGET_CPU`
  to the toolchain CMake.
* `build_test_runner.sh` derives `target_cpu` from `--target` and
  forwards it. The regex matches both bare `cortex-m<X>` (the canonical
  form after the Phase 1 AOT API drop of `+int8`) and the legacy
  `cortex-m<X>+int8` shape for any callers still on it.
* `build_executor_runner.sh` derives the matching `target_cpu` and
  supplies a dummy `ETHOSU_TARGET_NPU_CONFIG=ethos-u55-128` so
  core_platform's `ethosu_get_architecture()` parser stays happy.

A single `arm_test/cmake-out` continues to stage the core libraries —
when switching `target_cpu` locally, clear `arm_test/cmake-out` first
to avoid linking stale per-CPU artifacts. Without the `--target_cpu`
plumbing, `build_executorch.sh` defaulted to `-mcpu=cortex-m55`, so
the core libraries (libexecutorch.a, libcortex_m_kernels.a, the
bundled CMSIS-NN) baked in M55+MVE code paths. A runner built with
`-mcpu=cortex-m4` would link those libraries and execute MVE
instructions on Corstone-300's M55 — passing bundled-IO checks while
testing the wrong code path.

One transient patch is layered into the externally-fetched
`ethos-u/core_platform` repo via the existing `patch_repo` mechanism:
an `#if defined(__ARM_ARCH_8M_MAIN__) || defined(__ARM_ARCH_8_1M_MAIN__)`
guard around the MPU init block in `corstone-300/target.cpp`. Without
it, the Armv8-M-only `ARM_MPU_RBAR` / `ARM_MPU_RLAR` API breaks the
build for older cores. The FVP doesn't enforce protection regions
without an explicit setup, so simulation correctness is unaffected.
The patch is a bridge — see TODO at `corstone_utils.cmake:52` —
pending upstream merge of the equivalent guard.

Inside our own runner, the optional Armv8.1-M PMU intrinsics
(`ARM_PMU_*`) in `arm_executor_runner.cpp` and `arm_perf_monitor.cpp`
are guarded on `__ARM_ARCH_8_1M_MAIN__`. Earlier cores get a zero
cycle count rather than a compile error; functional correctness is
unaffected. `run_fvp.sh` routes all `cortex-m*` targets except
`cortex-m85*` to the Corstone-300 FVP.

Locally validated end-to-end on Corstone-300 with the `qadd` model:

* `cortex-m55` — baseline, PASS; op_quantize_per_tensor.cpp.obj
  contains MVE intrinsics (vdup.16, vmax.s16).
* `cortex-m4` — PASS; same object has no MVE — only single-precision
  FP (vmul.f32, vcvt.s32.f32). CMSIS-NN selects the DSP path (1275 DSP
  opcodes in libcmsis-nn.a).
* `cortex-m7` — PASS; same shape as M4.

Scalar-class variants (`cortex-m{0,0plus,3,23}`) still need a
follow-up: an Armv6-M `HardFault_Handler` guard in `target.cpp` and a
`core_software/cmsis.cmake` `ARMCM0plus` directory-case fix. The
target_cpu plumbing here already accommodates soft-float ABI builds —
the follow-up only adds those two additional `__ARM_ARCH_*` guards.

Authored with Claude.
@rascani rascani force-pushed the cortex-m-non-mve-corstone branch from 5d54200 to 8b8ce9e Compare May 13, 2026 18:16
@rascani rascani marked this pull request as ready for review May 13, 2026 18:19
@rascani rascani requested a review from digantdesai as a code owner May 13, 2026 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: arm Issues related to arm backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants