Cortex-M: build for any Cortex-M variant against Corstone-300#19520
Cortex-M: build for any Cortex-M variant against Corstone-300#19520rascani wants to merge 1 commit into
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19520
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 1 New Failure, 1 Cancelled JobAs of commit 8b8ce9e with merge base 8e8e957 ( NEW FAILURE - The following job has failed:
CANCELLED JOB - The following job was cancelled. Please retry:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
|
LGTM. I like the direction this draft PR is going.. |
Extend the Cortex-M test pipeline so the `cortex-m<variant>` target
strings registered in the AOT compile-config plumbing actually produce
runnable, ISA-faithful binaries. The binary is built end-to-end with
`-mcpu=cortex-m<variant>` — runner and core libraries alike — so
CMSIS-NN's compile-time `__ARM_FEATURE_DSP` / `__ARM_FEATURE_MVE`
selector exercises the matching kernel implementation. The Corstone-300
M55 simulator is an ISA superset of every earlier Cortex-M, so it
executes binaries compiled for older cores without modification — the
CI gate becomes "did the right CMSIS-NN code path execute correctly"
rather than "did per-CPU silicon behave as expected".
The build pipeline learns the target CPU end-to-end:
* `build_executorch.sh` accepts `--target_cpu` and passes `-DTARGET_CPU`
to the toolchain CMake.
* `build_test_runner.sh` derives `target_cpu` from `--target` and
forwards it. The regex matches both bare `cortex-m<X>` (the canonical
form after the Phase 1 AOT API drop of `+int8`) and the legacy
`cortex-m<X>+int8` shape for any callers still on it.
* `build_executor_runner.sh` derives the matching `target_cpu` and
supplies a dummy `ETHOSU_TARGET_NPU_CONFIG=ethos-u55-128` so
core_platform's `ethosu_get_architecture()` parser stays happy.
A single `arm_test/cmake-out` continues to stage the core libraries —
when switching `target_cpu` locally, clear `arm_test/cmake-out` first
to avoid linking stale per-CPU artifacts. Without the `--target_cpu`
plumbing, `build_executorch.sh` defaulted to `-mcpu=cortex-m55`, so
the core libraries (libexecutorch.a, libcortex_m_kernels.a, the
bundled CMSIS-NN) baked in M55+MVE code paths. A runner built with
`-mcpu=cortex-m4` would link those libraries and execute MVE
instructions on Corstone-300's M55 — passing bundled-IO checks while
testing the wrong code path.
One transient patch is layered into the externally-fetched
`ethos-u/core_platform` repo via the existing `patch_repo` mechanism:
an `#if defined(__ARM_ARCH_8M_MAIN__) || defined(__ARM_ARCH_8_1M_MAIN__)`
guard around the MPU init block in `corstone-300/target.cpp`. Without
it, the Armv8-M-only `ARM_MPU_RBAR` / `ARM_MPU_RLAR` API breaks the
build for older cores. The FVP doesn't enforce protection regions
without an explicit setup, so simulation correctness is unaffected.
The patch is a bridge — see TODO at `corstone_utils.cmake:52` —
pending upstream merge of the equivalent guard.
Inside our own runner, the optional Armv8.1-M PMU intrinsics
(`ARM_PMU_*`) in `arm_executor_runner.cpp` and `arm_perf_monitor.cpp`
are guarded on `__ARM_ARCH_8_1M_MAIN__`. Earlier cores get a zero
cycle count rather than a compile error; functional correctness is
unaffected. `run_fvp.sh` routes all `cortex-m*` targets except
`cortex-m85*` to the Corstone-300 FVP.
Locally validated end-to-end on Corstone-300 with the `qadd` model:
* `cortex-m55` — baseline, PASS; op_quantize_per_tensor.cpp.obj
contains MVE intrinsics (vdup.16, vmax.s16).
* `cortex-m4` — PASS; same object has no MVE — only single-precision
FP (vmul.f32, vcvt.s32.f32). CMSIS-NN selects the DSP path (1275 DSP
opcodes in libcmsis-nn.a).
* `cortex-m7` — PASS; same shape as M4.
Scalar-class variants (`cortex-m{0,0plus,3,23}`) still need a
follow-up: an Armv6-M `HardFault_Handler` guard in `target.cpp` and a
`core_software/cmsis.cmake` `ARMCM0plus` directory-case fix. The
target_cpu plumbing here already accommodates soft-float ABI builds —
the follow-up only adds those two additional `__ARM_ARCH_*` guards.
Authored with Claude.
5d54200 to
8b8ce9e
Compare
Summary
Extend the Cortex-M test pipeline so the
cortex-m<variant>targetstrings registered in the AOT compile-config plumbing actually produce
runnable, ISA-faithful binaries. The binary is built end-to-end with
-mcpu=cortex-m<variant>— runner and core libraries alike — soCMSIS-NN's compile-time
__ARM_FEATURE_DSP/__ARM_FEATURE_MVEselector exercises the matching kernel implementation. The Corstone-300
M55 simulator is an ISA superset of every earlier Cortex-M, so it
executes binaries compiled for older cores without modification — the
CI gate becomes "did the right CMSIS-NN code path execute correctly"
rather than "did per-CPU silicon behave as expected".
The build pipeline learns the target CPU end-to-end:
build_executorch.shaccepts--target_cpuand passes-DTARGET_CPUto the toolchain CMake.
build_test_runner.shderivestarget_cpufrom--targetandforwards it. The regex matches both bare
cortex-m<X>(the canonicalform after the Phase 1 AOT API drop of
+int8) and the legacycortex-m<X>+int8shape for any callers still on it.build_executor_runner.shderives the matchingtarget_cpuandsupplies a dummy
ETHOSU_TARGET_NPU_CONFIG=ethos-u55-128socore_platform's
ethosu_get_architecture()parser stays happy.One transient patch is layered into the externally-fetched
ethos-u/core_platformrepo via the existingpatch_repomechanism:an
#if defined(__ARM_ARCH_8M_MAIN__) || defined(__ARM_ARCH_8_1M_MAIN__)guard around the MPU init block in
corstone-300/target.cpp. Withoutit, the Armv8-M-only
ARM_MPU_RBAR/ARM_MPU_RLARAPI breaks thebuild for older cores. The FVP doesn't enforce protection regions
without an explicit setup, so simulation correctness is unaffected.
The patch is a bridge — see TODO at
corstone_utils.cmake:52—pending upstream merge of the equivalent guard.
Inside our own runner, the optional Armv8.1-M PMU intrinsics
(
ARM_PMU_*) inarm_executor_runner.cppandarm_perf_monitor.cppare guarded on
__ARM_ARCH_8_1M_MAIN__. Earlier cores get a zerocycle count rather than a compile error; functional correctness is
unaffected.
run_fvp.shroutes allcortex-m*targets exceptcortex-m85*to the Corstone-300 FVP.Test Plan
Locally validated end-to-end on Corstone-300 with the
qaddmodel:cortex-m55— baseline, PASS; op_quantize_per_tensor.cpp.objcontains MVE intrinsics (vdup.16, vmax.s16).
cortex-m4— PASS; same object has no MVE — only single-precisionFP (vmul.f32, vcvt.s32.f32). CMSIS-NN selects the DSP path (1275 DSP
opcodes in libcmsis-nn.a).
cortex-m7— PASS; same shape as M4.Scalar-class variants (
cortex-m{0,0plus,3,23}) still need afollow-up: an Armv6-M
HardFault_Handlerguard intarget.cppand acore_software/cmsis.cmakeARMCM0plusdirectory-case fix. Thetarget_cpu plumbing here already accommodates soft-float ABI builds —
the follow-up only adds those two additional
__ARM_ARCH_*guards.Authored with Claude.
cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell