Add ISOSDacInterface17 for StressLog enumeration via cDAC#129272
Conversation
1c1c89a to
714265b
Compare
Add a new ISOSDacInterface17 COM interface that exposes the cDAC's IStressLog contract to SOS and clrmd consumers. This enables reading stress logs from both CoreCLR and NativeAOT without hardcoded struct offsets. Changes: - Add StartTime (wall-clock FILETIME) field to cDAC data descriptors, Data/StressLog.cs, StressLogData record, and contract implementation - Add StressLog to ContractRegistry - Add Address field to ThreadStressLogData for thread identification - Define ISOSDacInterface17 with IsStressLogAvailable, GetStressLogData, GetStressLogThreadEnumerator, GetStressLogMessageEnumerator, and GetStressLogMemoryRanges - Define ISOSStressLogThreadEnum, ISOSStressLogMsgEnum, and ISOSStressLogMemoryEnum enumerator interfaces - Implement SOSDacImpl bridge delegating to _target.Contracts.StressLog - Add IDL definitions in sospriv.idl (native DAC does not implement this interface; QueryInterface for the IID will fail on the legacy DAC) - Add StressLog debuggee and StressLogDumpTests for dump-based validation - Add design document explaining the architecture and migration path Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
714265b to
b31ec46
Compare
There was a problem hiding this comment.
Pull request overview
This PR extends the cDAC stress log support by (1) enriching the StressLog contract data model (notably adding StartTime and thread-log address identity) and (2) introducing a new ISOSDacInterface17 COM surface (IDL + managed declarations + SOSDacImpl bridge) intended to let SOS/clrmd enumerate stress logs without hardcoded layout offsets. It also adds a new dump-test debuggee plus contract-level dump tests and a detailed design document.
Changes:
- Add
StartTimeto StressLog data descriptors (CoreCLR + NativeAOT) and flow it through the managed contract types. - Add
ISOSDacInterface17+ related StressLog enumerator interfaces/structs and implement the bridge inSOSDacImpl. - Add StressLog dump debuggee and dump-based integration tests for the StressLog contract, plus a design doc.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| src/native/managed/cdac/tests/DumpTests/StressLogDumpTests.cs | Adds dump-based integration tests for the StressLog contract (availability, header data, basic enumeration). |
| src/native/managed/cdac/tests/DumpTests/Debuggees/StressLog/StressLog.csproj | New debuggee project enabling stress logging via environment variables. |
| src/native/managed/cdac/tests/DumpTests/Debuggees/StressLog/Program.cs | New debuggee that generates allocations/GC activity then FailFasts to produce a dump. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Legacy/SOSDacImpl.cs | Implements ISOSDacInterface17 and new StressLog enumerator implementations bridging to the cDAC IStressLog contract. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Legacy/ISOSDacInterface.cs | Adds public COM interface/struct declarations for ISOSDacInterface17 and StressLog enumerators. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/StressLog.cs | Adds [Field] StartTime to the StressLog data model. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StressLog.cs | Plumbs StartTime into StressLogData and includes thread-log address in ThreadStressLogData. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/Contracts/IStressLog.cs | Extends StressLogData with StartTime and ThreadStressLogData with Address. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/ContractRegistry.cs | Exposes StressLog via Target.Contracts.StressLog. |
| src/coreclr/vm/datadescriptor/datadescriptor.inc | Exports StressLog.StartTime for CoreCLR via data descriptors. |
| src/coreclr/nativeaot/Runtime/datadescriptor/datadescriptor.inc | Exports StressLog.StartTime for NativeAOT via data descriptors. |
| src/coreclr/inc/sospriv.idl | Adds IDL definitions for ISOSDacInterface17 and associated StressLog structs/enumerator interfaces. |
| docs/design/datacontracts/stresslog-isosinterface-design.md | Design rationale and planned consumer migration strategy for exposing StressLog via SOS DAC interface. |
|
Is there a specific reason that regular CoreCLR and NativeAOT stress logs must be different? |
- Add [GeneratedComClass] to all three enumerator classes - Use ToTargetPointer for ClrDataAddress comparison - Shrink _lastBatchRaw to actual fetched size - Return ex.HResult instead of E_FAIL in IsStressLogAvailable Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
We should be able to delete https://github.com/max-charlamb/runtime/blob/bd3de83c6195bdbda612e67672fb1248a1014d28/src/coreclr/inc/stresslog.h#L225 once this is in place. (It does not have to be done in this PR.) |
- Remove DacpStressLogMemoryRange, ISOSStressLogMemoryEnum, and GetStressLogMemoryRanges from IDL, managed interface, and impl since no SOS consumer uses them. - Refactor SOSStressLogMsgEnum to eagerly materialize messages into an array, enabling Reset and GetCount support consistent with other ISOSEnum implementations. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Rename DacpStressLogData -> SOSStressLogData, DacpThreadStressLogData -> SOSThreadStressLogData, DacpStressMsgData -> SOSStressMsgData - Define structs inline in sospriv.idl with typedef struct and cpp_quote guards, following the SOSMethodData/SOSMemoryRegion convention used by newer interfaces - Remove Dacp forward declarations and dacprivate.h definitions - Put ISOSDacInterface16, ISOSDacInterface17 on same line in class decl Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Use 'written < count' pattern for S_FALSE (standard COM IEnumXxx convention matching SOSMethodEnum/SOSMemoryEnum) - Return E_POINTER directly for null output in GetStressLogData Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
StressMsgHeaderSize, ChunkSize, MaxMessageSize, and PointerSize are implementation details for raw memory walking that the message enumerator now abstracts away. Consumers no longer need them. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Callers can just call GetStressLogData/GetStressLogThreadEnumerator directly -- they return S_FALSE when stress log is not enabled. No need for a separate availability check method. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Messages with Timestamp==0 can appear at chunk boundaries on x86 with small stress log sizes. These represent unwritten/partial message slots and are valid contract output per the spec. Update tests to find the first message with a non-zero timestamp rather than assuming the first message is valid. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Matches the convention used by other SOSDAC sized-buffer APIs where S_FALSE indicates fewer items returned than requested. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
On x86 with small stress logs, partially-written messages can have non-zero timestamps but null format strings due to chunk-boundary wrapping. Look for messages with both valid timestamp and format string. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Temporary diagnostics to understand why x86 can't find valid messages: prints thread count, pointer size, per-thread metadata, and per-message timestamp/format/facility/argcount. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
StressLog.modules is an inline array (ModuleDesc modules[MAX_MODULES]), not a pointer. Using [Field] reads the first bytes of the array as a pointer value, which on x86 happens to be the coreclr base address (from modules[0].baseAddress), causing GetFormatPointer's module table lookup to iterate from the PE header instead of the module descriptors. Using [FieldAddress] returns the address of the inline array itself, which is the correct base for iterating the ModuleDesc entries. Verified fix locally against CI x86 dump that previously failed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
bded221 to
cb02ef6
Compare
|
/ba-g known test infra issue |
> [!NOTE] > This PR was created with assistance from GitHub Copilot. ## Summary Update `!DumpLog` to prefer the cDAC-based `ISOSDacInterface17` for stress log reading when available, falling back to the existing raw-memory path for older runtimes. This enables reading stress logs from both CoreCLR and NativeAOT processes. Companion to dotnet/runtime#129272 which defines `ISOSDacInterface17` and implements the cDAC bridge. ## Changes ### SOS `!DumpLog` update - Added `StressLog::DumpViaInterface17()` in `stressLogDump.cpp` -- a new code path that: - `QueryInterface` for `ISOSDacInterface17` - If available: uses `GetStressLogData`, `GetStressLogThreadEnumerator`, and `GetStressLogMessageEnumerator` to read stress log data through the cDAC contract - Produces identical output format to the legacy path (header block, merged timestamp-ordered messages, `formatOutput` for special format specifiers like `%pM`/`%pT`) - Returns `E_NOINTERFACE` if the interface is not available - Updated `!DumpLog` in `strike.cpp` to try the new path first, falling back to `StressLog::Dump()` if `E_NOINTERFACE` - Added `-legacy` flag to force the legacy raw-read path ### Interface definitions - Added `ISOSDacInterface17`, `ISOSStressLogThreadEnum`, `ISOSStressLogMsgEnum` interfaces and SOS* data structs to `sospriv.h` - Added IID GUIDs to `sospriv_i.cpp` ### Bug fix in legacy `StressLog::Dump` Fixed a pre-existing bug where the legacy `StressLog::Dump` could return a failure HRESULT even after successfully processing all stress log entries. The issue was that `hr` was being overwritten by `ReadVirtual` calls for format strings inside the message loop. If the last format string read failed (e.g., unmapped memory page in the dump on macOS), the error propagated as the function's return value despite all messages being processed correctly. The fix avoids clobbering `hr` in the loop and explicitly sets it based on whether messages were processed. ### Test coverage - Added `!DumpLog` to the `OtherCommands.script` test with `VERIFY:SUCCESS` - Gated stress log enablement (`DOTNET_StressLog=1`) to only the OtherCommands test via `EnableStressLog` property on `TestInformation` ## Backward Compatibility - When running against a runtime **with** `ISOSDacInterface17` (dotnet/runtime#129272): uses the new cDAC-based path, which supports both CoreCLR and NativeAOT - When running against a runtime **without** `ISOSDacInterface17`: `QueryInterface` returns `E_NOINTERFACE`, falls back to the existing `StressLog::Dump()` raw-memory path -- zero behavior change ## Related PRs - **Runtime**: dotnet/runtime#129272 -- Defines `ISOSDacInterface17` and implements the cDAC bridge Co-authored-by: Max Charlamb <maxcharlamb@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Note
This PR was created with assistance from GitHub Copilot.
Summary
Add a new
ISOSDacInterface17COM interface that exposes the cDAC'sIStressLogcontract to SOS and clrmd consumers. This enables reading stress logs from both CoreCLR and NativeAOT processes without hardcoded struct offsets.Motivation
SOS (
!DumpLog) and clrmd currently read stress logs via raw memory reads with hardcoded struct offsets matching the CoreCLR layout. NativeAOT'sStressLogstruct has a different layout (different lock type, no module table, no padding sentinel), so these tools cannot read NativeAOT stress logs.The cDAC already has a working
IStressLogcontract (StressLog_1/StressLog_2) that reads stress logs from both runtimes usingdatadescriptor.incfield offsets. This PR bridges that contract to the COM interface layer so SOS and clrmd can consume it.Changes
cDAC contract updates
StartTime(wall-clock FILETIME) to data descriptors (CoreCLR + NativeAOT),Data/StressLog.cs,StressLogDatarecord, and contract implementationAddressfield toThreadStressLogDatafor thread identification across the COM boundaryStressLogtoContractRegistryISOSDacInterface17 definition
SOSStressLogData,SOSThreadStressLogData,SOSStressMsgData(defined inline in IDL following modern convention)ISOSStressLogThreadEnum,ISOSStressLogMsgEnumISOSDacInterface17:GetStressLogData,GetStressLogThreadEnumerator,GetStressLogMessageEnumeratorS_FALSEwhen stress log is not enabled (no separate availability check needed)SOSDacImpl bridge
Reset/GetCountsupport andGetArgumentsfor per-batch arg retrievalQueryInterfacefor the IID will fail on the legacy DAC, and only the cDAC handles itIDL + testing
sospriv.idlTest Results
All dump tests pass:
StressLogIsAvailable-- PASSEDStressLogDataIsValid-- PASSEDCanEnumerateThreadsAndMessages-- PASSEDRelated PRs
!DumpLogupdated to useISOSDacInterface17with fallback