Skip to content

[cDAC] Enable arm32 GC stress + fix code/data pointer type confusion on ARM32#129672

Open
max-charlamb wants to merge 8 commits into
dotnet:mainfrom
max-charlamb:ci/arm32-gcstress
Open

[cDAC] Enable arm32 GC stress + fix code/data pointer type confusion on ARM32#129672
max-charlamb wants to merge 8 commits into
dotnet:mainfrom
max-charlamb:ci/arm32-gcstress

Conversation

@max-charlamb

@max-charlamb max-charlamb commented Jun 21, 2026

Copy link
Copy Markdown
Member

Enables the cDAC GC stress verification leg on linux_arm (arm32) in runtime-diagnostics.yml. While bringing arm32 online surfaced a long-standing type confusion in the cDAC managed side that this PR also fixes.

Pipeline change

Adds linux_arm to the cdacStressPlatforms default in eng/pipelines/runtime-diagnostics.yml. The Helix queue mapping already existed in prepare-cdac-stress-helix-steps.yml (helix_linux_arm32_oldest), so this is a one-line enablement.

arm32 failure surfaced by enabling the leg

First run on arm32 failed every verification with the same shape - e.g. DynamicMethods: 1694 verifications (18 pass / 1676 fail / 0 known-issue), with every frame appearing duplicated at consecutive IPs differing by 1:

Frame #0 <0xea89cc4e> MISMATCH cDAC=0 RT=2  ONLY(RT)
Frame #1 <0xea89cc4f> MISMATCH cDAC=2 RT=0  ONLY(cDAC)   <-- same refs, IP|1

Root cause: ARM32 control PCs carry the Thumb bit (LSB) to indicate execution mode. The native runtime applies PCODEToPINSTR (utilcode.h) before reporting an IP as StackRefData.Source:

  • Legacy DAC: src/coreclr/debug/daccess/daccess.cpp:7558 -- dsc->pc = PCODEToPINSTR(GetControlPC(pRD))
  • In-process stress oracle: src/coreclr/vm/cdacstress.cpp:781-782 -- same masking

The cDAC stored raw PCODE in GcScanContext.InstructionPointer and emitted it as the Source, so every cDAC ref got keyed at IP|1 while the runtime reported at IP.

Fix: type the IP/return-address surface as TargetCodePointer

Rather than masking the Thumb bit ad-hoc at one consumer site, the proper fix is to type these values correctly throughout the stack so the compiler stops the next person from mixing code pointers and data pointers:

Managed contract surface promoted TargetPointerTargetCodePointer:

  • IPlatformContext.InstructionPointer and every per-arch impl (X86Context, AMD64Context, ARMContext, ARM64Context, LoongArch64Context, RISCV64Context)
  • IPlatformAgnosticContext.InstructionPointer and ContextHolder<T>.InstructionPointer
  • IStackWalk.GetInstructionPointer, FrameIterator.GetCurrentReturnAddress, FrameHelpers.GetReturnAddress
  • [Field] properties on Data.TransitionBlock.ReturnAddress, Data.HijackFrame.ReturnAddress, Data.SoftwareExceptionFrame.ReturnAddress, Data.TailCallFrame.ReturnAddress, Data.InlinedCallFrame.CallerReturnAddress

Native data-descriptor changes (src/coreclr/vm/datadescriptor/datadescriptor.inc) - the corresponding 5 CDAC_TYPE_FIELD declarations switched from T_POINTER to TYPE(CodePointer) so the descriptor's advertised type matches what the field actually holds. Each was verified to carry the Thumb bit on ARM32:

Field Evidence
TransitionBlock::m_ReturnAddress callingconvention.h:140-148 - explicitly aliased to {r4..r11, lr} (saved LR = PC|1)
InlinedCallFrame::m_pCallerReturnAddress ARM asm str lr, [...] (pinvokestubs.S:96, 182); read back as PCODE (stubs.cpp:1348-1349)
HijackFrame::m_ReturnAddress ctor sourced from on-stack saved LR (threadsuspend.cpp:4546)
SoftwareExceptionFrame::m_ReturnAddress copied straight into ARM Pc register field (excep.cpp:10474-10475)
TailCallFrame::m_ReturnAddress x86-only (descriptor guarded by TARGET_X86); CodePointer still semantically correct

Conversion lives in one place: GcScanContext.SetSource calls CodePointerUtils.AddressFromCodePointer (the existing single source of truth for PCODE → PINSTR on a target) when populating StackRefData.Source. Other consumers that want code pointers (IsManaged, IsInterpreterCode, _eman.GetCodeBlockHandle) now receive TargetCodePointer directly. The few places that need raw data addresses (AMD64 unwinder's controlPC arithmetic with imageBase, x64-only SOSDacImpl.GetJumpThunkTarget) call .AsTargetPointer explicitly.

Validation

  • ./build.cmd clr.runtime -c Release succeeded; data descriptor regenerated with TYPE(CodePointer) fields.
  • cDAC unit tests: Passed: 2571, Failed: 0, Skipped: 16.
  • CI: every CdacBuild, CdacDumpTest, and CdacStressTest leg green - including the new CdacStressTest linux-arm leg that surfaced the bug, plus existing linux-arm64, linux-x64, windows-arm64, windows-x64 legs.

Notes for reviewers

  • IStackWalk and IPlatformAgnosticContext live in Microsoft.Diagnostics.DataContractReader.Abstractions. Per the cDAC API Review guidance (cdac.instructions.md), implementations of IContract are not under formal API review on .NET 11 dev branches, and the contract assemblies are internal/unstable. Changing the IP/return-address types is intentional and not a breaking-change event.
  • CodePointerUtils.AddressFromCodePointer already throws NotImplementedException for HasArm64PtrAuth; when that's wired up, the single conversion in SetSource picks it up with no further changes.

Note

This PR was authored with assistance from GitHub Copilot.

Adds linux_arm to the cdacStressPlatforms default list in
runtime-diagnostics.yml so the CdacStressTests legs run against arm32
alongside x64/arm64. The Helix queue mapping for linux_arm already
exists in prepare-cdac-stress-helix-steps.yml.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Enables cDAC GC stress coverage on Linux ARM32 by adding linux_arm to the default cdacStressPlatforms parameter list in the diagnostics pipeline, so the existing CdacStressTests stage runs on arm32 in addition to the current x64/arm64 legs.

Changes:

  • Added linux_arm to the default cdacStressPlatforms matrix in eng/pipelines/runtime-diagnostics.yml.

On ARM32 the control PC carries the Thumb bit (LSB) to indicate
execution mode. The native runtime applies PCODEToPINSTR (utilcode.h)
before reporting the IP as StackRefData.Source (DAC's daccess.cpp uses
`PCODEToPINSTR(GetControlPC(pRD))`; the in-process stress runtime
oracle does the same in src/coreclr/vm/cdacstress.cpp). The cDAC was
storing the raw PCODE, so every cDAC ref got keyed at IP|1 while the
runtime reported at IP - producing universal mismatches in GC root
verification.

Fix: cache TargetArchitecture in GcScanContext at construction and mask
the LSB in UpdateScanContext on ARM32 only. No-op on other architectures.

Surfaced by the cDAC stress legs added by this PR on linux_arm
(BasicCdacStressTests.GCStress_AllVerificationsPass: e.g.
DynamicMethods reported 1676 fail / 18 pass / 0 known-issue, with the
characteristic pair pattern Frame #N=IP and Frame #N+1=IP|1).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@max-charlamb

This comment was marked as outdated.

Replaces the ad-hoc IP.Value & ~1 mask in GcScanContext with a proper
type-driven fix: IP is conceptually a code pointer (PCODE on ARM32 with
the Thumb bit, ARM64 PtrAuth-bearing on supported targets), not a data
address. Promote IPlatformContext / IPlatformAgnosticContext /
ContextHolder.InstructionPointer and the Frame ReturnAddress fields to
TargetCodePointer so callers can no longer accidentally treat them as
data pointers.

Conversion to a data address now lives in one place: GcScanContext.SetSource
calls CodePointerUtils.AddressFromCodePointer when populating
StackRefData.Source, matching native PCODEToPINSTR in
src/coreclr/inc/utilcode.h. Other consumers (IsManaged, IsInterpreterCode,
GetCodeBlockHandle) want TargetCodePointer directly and get it without
manual masking; the few places that need the raw data address (AMD64
controlPC arithmetic, x64-only SOS GetJumpThunkTarget) convert explicitly
via .AsTargetPointer.

cDAC unit tests: Passed: 2571, Failed: 0, Skipped: 16.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 21, 2026 20:08

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 4 comments.

Comment thread eng/pipelines/runtime-diagnostics.yml
Max Charlamb and others added 2 commits June 21, 2026 16:24
These fields all hold PCODE on ARM32 (carry the Thumb bit), so the
native data descriptor should advertise them as TYPE(CodePointer)
rather than T_POINTER:

- TransitionBlock::m_ReturnAddress -- aliased to the saved link
  register in callingconvention.h (line 143-147 ARM block: comment
  `alias saved link register as m_ReturnAddress`). LR on ARM32 in
  Thumb mode = PC | 1.
- InlinedCallFrame::m_pCallerReturnAddress -- ARM asm
  `str lr, [r4, #InlinedCallFrame__m_pCallerReturnAddress]`
  (pinvokestubs.S:96, 182); used as PCODE in stubs.cpp:1348-1349
  (`*pRD->pPC = m_pCallerReturnAddress; pRD->ControlPC = ...`).
- HijackFrame::m_ReturnAddress -- ctor takes `LPVOID returnAddress`
  (threadsuspend.cpp:4799) sourced from `m_pvHJRetAddr` which is
  read from the on-stack saved LR slot (line 4546:
  `m_pvHJRetAddr = *esb->m_ppvRetAddrPtr`).
- SoftwareExceptionFrame::m_ReturnAddress -- ARM block in
  excep.cpp:10474-10475 copies from
  `pTransitionBlock->m_ReturnAddress` directly into `m_Context.Pc`
  and `m_ReturnAddress`.
- TailCallFrame::m_ReturnAddress -- x86-only (descriptor entry guarded
  by `TARGET_X86 && !UNIX_X86_ABI`); CodePointer is still
  semantically correct (no transform applies on x86).

This unblocks the type-safe handling in the cDAC managed side where
these fields are now declared `[Field] TargetCodePointer ReturnAddress`,
and lets ReadCodePointerField's `"CodePointer"` type-name assertion
pass against real (non-mock) descriptors.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…nter type promotion

Three call sites in DumpTests still expected TargetPointer for
`IStackWalk.GetInstructionPointer` and
`IPlatformAgnosticContext.InstructionPointer`. The promotion to
TargetCodePointer (commit df50d08) didn't flag them locally because
the DumpTests project doesn't build in the cdac unit test run.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 21, 2026 21:22

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 33 out of 33 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

src/native/managed/cdac/tests/DumpTests/StackWalkDumpTests.cs:377

  • This call constructs ClrDataAddress directly from ip.Value. Since TargetCodePointer now has a ToClrDataAddress(Target) helper, using it here would handle 32-bit sign-extension consistently with other usages in this file (and avoid subtle address mismatches on x86).

Comment thread eng/pipelines/runtime-diagnostics.yml
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@max-charlamb max-charlamb marked this pull request as ready for review June 22, 2026 00:30
Copilot AI review requested due to automatic review settings June 22, 2026 00:30

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 33 out of 33 changed files in this pull request and generated 5 comments.

Comment thread eng/pipelines/runtime-diagnostics.yml
@dotnet-policy-service

Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @steveisok, @tommcdon, @dotnet/dotnet-diag
See info in area-owners.md if you want to be subscribed.

- InlinedCallFrameHasActiveCall: use TargetCodePointer.Null to match
  the now-TargetCodePointer CallerReturnAddress field.
- GcScanContext.SetSource: narrow the comment about ARM64 PtrAuth -
  CodePointerUtils.AddressFromCodePointer currently throws for that
  case, so don't claim it's already handled.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@max-charlamb max-charlamb changed the title [cDAC] Enable cDAC GC stress tests on linux_arm (arm32) [cDAC] Enable arm32 GC stress + fix code/data pointer type confusion on ARM32 Jun 22, 2026
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 22, 2026 01:36

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 33 out of 33 changed files in this pull request and generated 1 comment.

@max-charlamb max-charlamb requested a review from rcj1 June 22, 2026 02:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants