[cDAC] Enable arm32 GC stress + fix code/data pointer type confusion on ARM32#129672
Open
max-charlamb wants to merge 8 commits into
Open
[cDAC] Enable arm32 GC stress + fix code/data pointer type confusion on ARM32#129672max-charlamb wants to merge 8 commits into
max-charlamb wants to merge 8 commits into
Conversation
Adds linux_arm to the cdacStressPlatforms default list in runtime-diagnostics.yml so the CdacStressTests legs run against arm32 alongside x64/arm64. The Helix queue mapping for linux_arm already exists in prepare-cdac-stress-helix-steps.yml. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Enables cDAC GC stress coverage on Linux ARM32 by adding linux_arm to the default cdacStressPlatforms parameter list in the diagnostics pipeline, so the existing CdacStressTests stage runs on arm32 in addition to the current x64/arm64 legs.
Changes:
- Added
linux_armto the defaultcdacStressPlatformsmatrix ineng/pipelines/runtime-diagnostics.yml.
On ARM32 the control PC carries the Thumb bit (LSB) to indicate execution mode. The native runtime applies PCODEToPINSTR (utilcode.h) before reporting the IP as StackRefData.Source (DAC's daccess.cpp uses `PCODEToPINSTR(GetControlPC(pRD))`; the in-process stress runtime oracle does the same in src/coreclr/vm/cdacstress.cpp). The cDAC was storing the raw PCODE, so every cDAC ref got keyed at IP|1 while the runtime reported at IP - producing universal mismatches in GC root verification. Fix: cache TargetArchitecture in GcScanContext at construction and mask the LSB in UpdateScanContext on ARM32 only. No-op on other architectures. Surfaced by the cDAC stress legs added by this PR on linux_arm (BasicCdacStressTests.GCStress_AllVerificationsPass: e.g. DynamicMethods reported 1676 fail / 18 pass / 0 known-issue, with the characteristic pair pattern Frame #N=IP and Frame #N+1=IP|1). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This comment was marked as outdated.
This comment was marked as outdated.
Replaces the ad-hoc IP.Value & ~1 mask in GcScanContext with a proper type-driven fix: IP is conceptually a code pointer (PCODE on ARM32 with the Thumb bit, ARM64 PtrAuth-bearing on supported targets), not a data address. Promote IPlatformContext / IPlatformAgnosticContext / ContextHolder.InstructionPointer and the Frame ReturnAddress fields to TargetCodePointer so callers can no longer accidentally treat them as data pointers. Conversion to a data address now lives in one place: GcScanContext.SetSource calls CodePointerUtils.AddressFromCodePointer when populating StackRefData.Source, matching native PCODEToPINSTR in src/coreclr/inc/utilcode.h. Other consumers (IsManaged, IsInterpreterCode, GetCodeBlockHandle) want TargetCodePointer directly and get it without manual masking; the few places that need the raw data address (AMD64 controlPC arithmetic, x64-only SOS GetJumpThunkTarget) convert explicitly via .AsTargetPointer. cDAC unit tests: Passed: 2571, Failed: 0, Skipped: 16. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
These fields all hold PCODE on ARM32 (carry the Thumb bit), so the native data descriptor should advertise them as TYPE(CodePointer) rather than T_POINTER: - TransitionBlock::m_ReturnAddress -- aliased to the saved link register in callingconvention.h (line 143-147 ARM block: comment `alias saved link register as m_ReturnAddress`). LR on ARM32 in Thumb mode = PC | 1. - InlinedCallFrame::m_pCallerReturnAddress -- ARM asm `str lr, [r4, #InlinedCallFrame__m_pCallerReturnAddress]` (pinvokestubs.S:96, 182); used as PCODE in stubs.cpp:1348-1349 (`*pRD->pPC = m_pCallerReturnAddress; pRD->ControlPC = ...`). - HijackFrame::m_ReturnAddress -- ctor takes `LPVOID returnAddress` (threadsuspend.cpp:4799) sourced from `m_pvHJRetAddr` which is read from the on-stack saved LR slot (line 4546: `m_pvHJRetAddr = *esb->m_ppvRetAddrPtr`). - SoftwareExceptionFrame::m_ReturnAddress -- ARM block in excep.cpp:10474-10475 copies from `pTransitionBlock->m_ReturnAddress` directly into `m_Context.Pc` and `m_ReturnAddress`. - TailCallFrame::m_ReturnAddress -- x86-only (descriptor entry guarded by `TARGET_X86 && !UNIX_X86_ABI`); CodePointer is still semantically correct (no transform applies on x86). This unblocks the type-safe handling in the cDAC managed side where these fields are now declared `[Field] TargetCodePointer ReturnAddress`, and lets ReadCodePointerField's `"CodePointer"` type-name assertion pass against real (non-mock) descriptors. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…nter type promotion Three call sites in DumpTests still expected TargetPointer for `IStackWalk.GetInstructionPointer` and `IPlatformAgnosticContext.InstructionPointer`. The promotion to TargetCodePointer (commit df50d08) didn't flag them locally because the DumpTests project doesn't build in the cdac unit test run. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 33 out of 33 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
src/native/managed/cdac/tests/DumpTests/StackWalkDumpTests.cs:377
- This call constructs ClrDataAddress directly from ip.Value. Since TargetCodePointer now has a ToClrDataAddress(Target) helper, using it here would handle 32-bit sign-extension consistently with other usages in this file (and avoid subtle address mismatches on x86).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
|
Tagging subscribers to this area: @steveisok, @tommcdon, @dotnet/dotnet-diag |
- InlinedCallFrameHasActiveCall: use TargetCodePointer.Null to match the now-TargetCodePointer CallerReturnAddress field. - GcScanContext.SetSource: narrow the comment about ARM64 PtrAuth - CodePointerUtils.AddressFromCodePointer currently throws for that case, so don't claim it's already handled. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Enables the cDAC GC stress verification leg on
linux_arm(arm32) inruntime-diagnostics.yml. While bringing arm32 online surfaced a long-standing type confusion in the cDAC managed side that this PR also fixes.Pipeline change
Adds
linux_armto thecdacStressPlatformsdefault ineng/pipelines/runtime-diagnostics.yml. The Helix queue mapping already existed inprepare-cdac-stress-helix-steps.yml(helix_linux_arm32_oldest), so this is a one-line enablement.arm32 failure surfaced by enabling the leg
First run on arm32 failed every verification with the same shape - e.g.
DynamicMethods:1694 verifications (18 pass / 1676 fail / 0 known-issue), with every frame appearing duplicated at consecutive IPs differing by 1:Root cause: ARM32 control PCs carry the Thumb bit (LSB) to indicate execution mode. The native runtime applies
PCODEToPINSTR(utilcode.h) before reporting an IP asStackRefData.Source:src/coreclr/debug/daccess/daccess.cpp:7558--dsc->pc = PCODEToPINSTR(GetControlPC(pRD))src/coreclr/vm/cdacstress.cpp:781-782-- same maskingThe cDAC stored raw PCODE in
GcScanContext.InstructionPointerand emitted it as the Source, so every cDAC ref got keyed atIP|1while the runtime reported atIP.Fix: type the IP/return-address surface as
TargetCodePointerRather than masking the Thumb bit ad-hoc at one consumer site, the proper fix is to type these values correctly throughout the stack so the compiler stops the next person from mixing code pointers and data pointers:
Managed contract surface promoted
TargetPointer→TargetCodePointer:IPlatformContext.InstructionPointerand every per-arch impl (X86Context,AMD64Context,ARMContext,ARM64Context,LoongArch64Context,RISCV64Context)IPlatformAgnosticContext.InstructionPointerandContextHolder<T>.InstructionPointerIStackWalk.GetInstructionPointer,FrameIterator.GetCurrentReturnAddress,FrameHelpers.GetReturnAddress[Field]properties onData.TransitionBlock.ReturnAddress,Data.HijackFrame.ReturnAddress,Data.SoftwareExceptionFrame.ReturnAddress,Data.TailCallFrame.ReturnAddress,Data.InlinedCallFrame.CallerReturnAddressNative data-descriptor changes (
src/coreclr/vm/datadescriptor/datadescriptor.inc) - the corresponding 5CDAC_TYPE_FIELDdeclarations switched fromT_POINTERtoTYPE(CodePointer)so the descriptor's advertised type matches what the field actually holds. Each was verified to carry the Thumb bit on ARM32:TransitionBlock::m_ReturnAddresscallingconvention.h:140-148- explicitly aliased to{r4..r11, lr}(saved LR = PC|1)InlinedCallFrame::m_pCallerReturnAddressstr lr, [...](pinvokestubs.S:96, 182); read back as PCODE (stubs.cpp:1348-1349)HijackFrame::m_ReturnAddressthreadsuspend.cpp:4546)SoftwareExceptionFrame::m_ReturnAddressPcregister field (excep.cpp:10474-10475)TailCallFrame::m_ReturnAddressTARGET_X86);CodePointerstill semantically correctConversion lives in one place:
GcScanContext.SetSourcecallsCodePointerUtils.AddressFromCodePointer(the existing single source of truth for PCODE → PINSTR on a target) when populatingStackRefData.Source. Other consumers that want code pointers (IsManaged,IsInterpreterCode,_eman.GetCodeBlockHandle) now receiveTargetCodePointerdirectly. The few places that need raw data addresses (AMD64 unwinder'scontrolPCarithmetic withimageBase, x64-onlySOSDacImpl.GetJumpThunkTarget) call.AsTargetPointerexplicitly.Validation
./build.cmd clr.runtime -c Releasesucceeded; data descriptor regenerated withTYPE(CodePointer)fields.Passed: 2571, Failed: 0, Skipped: 16.CdacBuild,CdacDumpTest, andCdacStressTestleg green - including the newCdacStressTest linux-armleg that surfaced the bug, plus existinglinux-arm64,linux-x64,windows-arm64,windows-x64legs.Notes for reviewers
IStackWalkandIPlatformAgnosticContextlive inMicrosoft.Diagnostics.DataContractReader.Abstractions. Per the cDACAPI Reviewguidance (cdac.instructions.md), implementations ofIContractare not under formal API review on .NET 11 dev branches, and the contract assemblies are internal/unstable. Changing the IP/return-address types is intentional and not a breaking-change event.CodePointerUtils.AddressFromCodePointeralready throwsNotImplementedExceptionforHasArm64PtrAuth; when that's wired up, the single conversion inSetSourcepicks it up with no further changes.Note
This PR was authored with assistance from GitHub Copilot.