Skip to content

Replace Delegate MethodInfo cache with the MethodDesc#99200

Open
MichalPetryka wants to merge 65 commits into
dotnet:mainfrom
MichalPetryka:immutable-delegates
Open

Replace Delegate MethodInfo cache with the MethodDesc#99200
MichalPetryka wants to merge 65 commits into
dotnet:mainfrom
MichalPetryka:immutable-delegates

Conversation

@MichalPetryka

@MichalPetryka MichalPetryka commented Mar 3, 2024

Copy link
Copy Markdown
Contributor

First attempt at making delegate GC fields immutable in CoreCLR so that they can be allocated on the NonGC heap.

I've checked it with a simple app and corerun locally with a delegate from an unloadable ALC and it seemed to not crash, assert nor unload the ALC from under the delegate, however I couldn't actually find any runtime tests that would verify delegates from unloadable ALCs work so the CI coverage might be missing.

One small point of concern is that this might make delegate equality checks slower since they rely on checking the methods in the last "slow path" check, which is however always hit for different delegates AFAIR.

Contributes to #85014.

cc @jkotas

@ghost ghost added the area-VM-coreclr label Mar 3, 2024
@MichalPetryka

MichalPetryka commented Mar 4, 2024

Copy link
Copy Markdown
Contributor Author

I'm not sure what's up with the failures here, tests that are failing on the CI seem to pass on my machine.
EDIT: I was testing with R2R disabled locally since VS kept complaining about being unable to load PDBs for it.

Comment thread src/coreclr/vm/object.cpp Outdated
MichalPetryka and others added 3 commits March 4, 2024 16:40
Co-authored-by: Jan Kotas <jkotas@microsoft.com>
@MichalPetryka MichalPetryka marked this pull request as ready for review March 4, 2024 23:28
@AndyAyersMS

Copy link
Copy Markdown
Member

/azp run runtime-coreclr gcstress0x3-gcstress0xc

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@jkotas

jkotas commented Mar 5, 2024

Copy link
Copy Markdown
Member

Could you please collect some perf numbers to give us an idea about the improvements and regressions in the affected areas? We may want to do some optimizations to mitigate the regressions.

@jkotas

jkotas commented Mar 5, 2024

Copy link
Copy Markdown
Member

One small point of concern is that this might make delegate equality checks slower since they rely on checking the methods in the last "slow path" check, which is however always hit for different delegates AFAIR.

The existing code tries to compare MethodInfos as a cheap fast path. Most delegates do not have cached MethodInfo, so this fast path is hit rarely - but it is very cheap, so it is still worth it.

This cheap fast path is not cheap anymore with this change. It may be best to delete the fast path that is trying to compare the MethodInfos and potentially optimize Delegate_InternalEqualMethodHandles instead.

@MichalPetryka

MichalPetryka commented Mar 5, 2024

Copy link
Copy Markdown
Contributor Author

Could you please collect some perf numbers to give us an idea about the improvements and regressions in the affected areas? We may want to do some optimizations to mitigate the regressions.

I think the thing that'd need benchmarking here are equality checks and maybe the impact of collectible delegates being stored in the CWT on the GC, the rest of things shouldn't be performance sensitive enough to matter I think? I'm not fully sure what'd be the proper way for benchmarking the latter.

This cheap fast path is not cheap anymore with this change. It may be best to delete the fast path that is trying to compare the MethodInfos and potentially optimize Delegate_InternalEqualMethodHandles instead.

I am going to benchmark the impact of the equality change tomorrow, I feel like if the impact won't be big, potential optimizations to it can be done later.

@jkotas

jkotas commented Mar 5, 2024

Copy link
Copy Markdown
Member

I'm not fully sure what'd be the proper way for benchmarking the latter.

Write a small program that loads an assembly as collectible and calls a method in it. The method in collectible assembly can create delegates in a loop. (If you would like to do it under benchmarknet, it works too - but it is probably more work.)

@MichalStrehovsky

Copy link
Copy Markdown
Member

Side question: doesn't that mean it's disabling all those trimming optimizations Michal did to not make delegate targets be marked as reflection visible on AOT?

The optimization has several degrees: never calling the API is the best - delegate targets are not considered reflection targets. Calling the API on something typed as Delegate or MulticastDelegate disables the optimization completely - all delegate targets are reflection targets.

But calling the API on something typed as a concrete delegate type (e.g. Func<int>) disables the optimization for delegates of that type only (or... canonically compatible with that type), hitting a middle ground. IIRC the default webapiaot template is in the middle ground. Not great, not terrible.

@MichalPetryka

Copy link
Copy Markdown
Contributor Author

@MihuBot

@MichalPetryka

MichalPetryka commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Results after latest cleanup:

BenchmarkDotNet v0.15.8, Windows 10 (10.0.19045.6466/22H2/2022Update)
AMD Ryzen 9 7900X 4.70GHz, 1 CPU, 24 logical and 12 physical cores
.NET SDK 11.0.100-preview.5.26302.115
  [Host]     : .NET 10.0.8 (10.0.8, 10.0.826.23019), X64 RyuJIT x86-64-v4
  PR         : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), X64 RyuJIT x86-64-v4
  Main       : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), X64 RyuJIT x86-64-v4
  .NET 10.0  : .NET 10.0.9 (10.0.9, 10.0.926.27113), X64 RyuJIT x86-64-v4

Runtime=.NET 10.0  
Method Job Toolchain Mean Error StdDev Gen0 Gen1 Allocated
Method PR CoreRun 0.7613 ns 0.0073 ns 0.0064 ns - - -
Method Main CoreRun 1.5165 ns 0.0063 ns 0.0055 ns - - -
Method .NET 10.0 Default 1.0946 ns 0.0069 ns 0.0061 ns - - -
MethodMulticast PR CoreRun 5.2355 ns 0.0207 ns 0.0184 ns - - -
MethodMulticast Main CoreRun 8.8053 ns 0.0599 ns 0.0561 ns - - -
MethodMulticast .NET 10.0 Default 3.5791 ns 0.0110 ns 0.0103 ns - - -
MethodDynamic PR CoreRun 0.7566 ns 0.0048 ns 0.0045 ns - - -
MethodDynamic Main CoreRun 1.5146 ns 0.0082 ns 0.0077 ns - - -
MethodDynamic .NET 10.0 Default 1.0698 ns 0.0048 ns 0.0043 ns - - -
Target PR CoreRun 0.7493 ns 0.0056 ns 0.0052 ns - - -
Target Main CoreRun 0.5723 ns 0.0033 ns 0.0031 ns - - -
Target .NET 10.0 Default 0.3906 ns 0.0040 ns 0.0037 ns - - -
TargetMulticast PR CoreRun 2.7477 ns 0.0269 ns 0.0239 ns - - -
TargetMulticast Main CoreRun 4.1021 ns 0.0119 ns 0.0099 ns - - -
TargetMulticast .NET 10.0 Default 2.6559 ns 0.0083 ns 0.0073 ns - - -
TargetDynamic PR CoreRun 0.5533 ns 0.0051 ns 0.0048 ns - - -
TargetDynamic Main CoreRun 0.3750 ns 0.0049 ns 0.0043 ns - - -
TargetDynamic .NET 10.0 Default 0.1975 ns 0.0075 ns 0.0066 ns - - -
EqualsTrue PR CoreRun 2.2325 ns 0.0670 ns 0.0627 ns - - -
EqualsTrue Main CoreRun 1.9491 ns 0.0162 ns 0.0151 ns - - -
EqualsTrue .NET 10.0 Default 1.1463 ns 0.0032 ns 0.0029 ns - - -
EqualsFalse PR CoreRun 2.5707 ns 0.0714 ns 0.0793 ns - - -
EqualsFalse Main CoreRun 22.7661 ns 0.1456 ns 0.1362 ns - - -
EqualsFalse .NET 10.0 Default 14.5304 ns 0.0445 ns 0.0372 ns - - -
EqualsMulticast PR CoreRun 1.3042 ns 0.0075 ns 0.0070 ns - - -
EqualsMulticast Main CoreRun 1.2997 ns 0.0071 ns 0.0063 ns - - -
EqualsMulticast .NET 10.0 Default 0.9088 ns 0.0068 ns 0.0064 ns - - -
GetHashCode PR CoreRun 2.0687 ns 0.0109 ns 0.0097 ns - - -
GetHashCode Main CoreRun 5.5471 ns 0.0089 ns 0.0070 ns - - -
GetHashCode .NET 10.0 Default 3.7339 ns 0.0184 ns 0.0172 ns - - -
DynamicCreate PR CoreRun 1,348.8091 ns 9.2080 ns 8.6132 ns 0.0744 0.0725 1256 B
DynamicCreate Main CoreRun 1,351.0421 ns 9.1618 ns 8.1217 ns 0.0744 0.0725 1256 B
DynamicCreate .NET 10.0 Default 1,186.3003 ns 17.1962 ns 15.2440 ns 0.0725 0.0706 1215 B

@jkotas Seems like now it improves basically everything and has no CWT. Can you review again then?

Sidenote: I have no idea why everything on main seems to have regressed from .NET 10. Did something cause BDN to measure differently now?

{
[ClassInterface(ClassInterfaceType.None)]
[ComVisible(true)]
[NonVersionable]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you expect this attribute to do?

The runtime has tight coupling with number of core types. We do not use NonVersionable attribute for that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since R2R already hardcodes parts of the layout, my thought was that it wouldn't hurt to do it fully and make it more apparent here.

Comment thread src/coreclr/System.Private.CoreLib/src/System/Delegate.CoreCLR.cs Outdated
Comment thread src/coreclr/System.Private.CoreLib/src/System/MulticastDelegate.CoreCLR.cs Outdated
Comment thread src/coreclr/System.Private.CoreLib/src/System/MulticastDelegate.CoreCLR.cs Outdated
Comment thread src/coreclr/System.Private.CoreLib/src/System/MulticastDelegate.CoreCLR.cs Outdated
return invocationList;
}

internal ReadOnlySpan<MulticastDelegate> GetInvocationsUnchecked()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mix of object, Delegate and MulticastDelegate does not look good. Can we make it look more like NativeAOT?

I think there is an opportunity to get rid of one of the fields by making the implementation to be more like NAOT we have discussed some time back.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Submitted #129635 as a first step towards that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mix of object, Delegate and MulticastDelegate does not look good. Can we make it look more like NativeAOT?

I'd prefer to do that in a followup but I agree.

I think there is an opportunity to get rid of one of the fields by making the implementation to be more like NAOT we have discussed some time back.

From what I remember, NativeAOT relied on function pointer equality a bit here which made it problematic, I'd leave such investigation for followup too.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the current version of change from a distance:

  • It folds MethodInfo cache and invocation list fields. It makes the delegate instance smaller. I think smaller delegates are goodness.
  • But then it uses the saved space to cache MethodDesc*. I get that having MethodDesc* cached in the delegate makes some operations faster. I am not sure whether burning the space for it is worth it.
  • Plus there is some refactoring sprinkled through that is not strictly related.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the current version of change from a distance:

  • It folds MethodInfo cache and invocation list fields. It makes the delegate instance smaller. I think smaller delegates are goodness.
  • But then it uses the saved space to cache MethodDesc*. I get that having MethodDesc* cached in the delegate makes some operations faster. I am not sure whether burning the space for it is worth it.
  • Plus there is some refactoring sprinkled through that is not strictly related.

Yeah I guess the timeline of the changes was:

  • I started with moving the MethodInfo for FOH
  • I had to introduce the MethodDesc to mitigate Equals regressions
  • I had to move the fields around to keep the R2R layout, which I combined with general cleanup.

For the MethodDesc part, I've realised that it's going to be necessary to be like this if we want to use the invocation count to store a GC handle/frozen indicator since otherwise we'll conflict with open virtual methods there.

If you want I can split the PR in 2 or 3 parts, either methodinfo + methoddesc and general cleanup or do the methoddesc separately too.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll conflict with open virtual methods

The open virtual case is rare (and generally a lot of pain to deal with). If we need to keep more information around for these rare cases, I think it is fine to lazily allocate an object with the extra information and stash it in _helperObject or _target. We can use the type of the object with the extra information to disambiguate the different rare cases if there is more than one.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can split the PR in 2 or 3 parts

If you can break things down into smaller changes that do just one coherent thing that is an obvious improvement, they will move fast. For example, it is fine to submit a PR with the cleanup of wrapper delegate cruft that I have missed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've realised that we can remove methodDesc by reusing incovationList so I did it here. We can thus make instances smaller while keeping most benefits.

I also thought out how we can do FOH support still later while doing this.

Doing this requires touching most of the cleanup places so that complicates the split up here a bit. I'll still try to split off some parts though.

@MichalPetryka MichalPetryka requested a review from jkotas June 20, 2026 01:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants