Skip to content

Optimize HashBytes for 64-bit and remove dead code#129640

Open
AaronRobinsonMSFT wants to merge 4 commits into
dotnet:mainfrom
AaronRobinsonMSFT:perf1
Open

Optimize HashBytes for 64-bit and remove dead code#129640
AaronRobinsonMSFT wants to merge 4 commits into
dotnet:mainfrom
AaronRobinsonMSFT:perf1

Conversation

@AaronRobinsonMSFT

@AaronRobinsonMSFT AaronRobinsonMSFT commented Jun 19, 2026

Copy link
Copy Markdown
Member

Optimize HashBytes (src/coreclr/inc/utilcode.h) to use xxHash32 primitives (XXHash32_QueueRound/XXHash32_MixFinal) processing 4 bytes at a time, replacing the original byte-at-a-time DJB2 hash.

HashBytes is used by the string literal interning path (EEUnicodeStringLiteralHashTableHelper::Hash) which hashes every string literal resolved during JIT compilation (ldstr token resolution). This makes it part of the steady-state cost for all managed applications.

The xxHash32 primitives are extracted from typehashingalgorithms.h into a new shared header src/coreclr/inc/dn_xxhash.h so they can be reused by both the type hashing and HashBytes paths.

Also removes unused dead code: EEUnicodeHashTableHelper, EEUnicodeStringHashTable, EEStringData::IsOnlyLowChars, HashStringN, HashiStringA, HashiStringN, and CaseInsensitiveStringCompareHash.

Measurement

A microbenchmark calling string.IsInterned() 200M times against a pool of 1000 non-interned strings was used to isolate the hashing cost. Profiled on macOS ARM64 (Apple M5) with a full Release build (clr+libs -c Release). Each row is the median of 5 runs.

Version Median vs Baseline
Upstream baseline (byte-at-a-time DJB2) 8.53s
xxHash32 primitives (QueueRound loop) 7.65s -10.3%
xxHash32 class (uint32_t) 8.30s -2.7%
xxHash32 class (uint64_t) 9.06s +6.2%

The direct primitives approach was chosen because the xxHash class queue machinery (position tracking, branching, accumulator state) adds overhead that erases the gain when used for bulk byte-stream hashing. The primitives inline cleanly into a tight loop.

Note

This PR was generated with the assistance of GitHub Copilot.

On 64-bit hosts, process 8 bytes at a time using a multiply-xorshift
mixing step instead of byte-at-a-time DJB2. The tail bytes (0-7) fall
through to the original byte loop. 32-bit hosts are unchanged.

Remove unused EEUnicodeHashTableHelper and EEUnicodeStringHashTable
along with their method implementations. Fix stale comment in
dynamicmethod.cpp that referenced the removed type.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dotnet-policy-service

Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @agocke
See info in area-owners.md if you want to be subscribed.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates CoreCLR’s internal hashing helpers by optimizing HashBytes on 64-bit hosts and removing unused Unicode/string hashing helper types, primarily impacting hash-table usage in the VM and shared hash utilities.

Changes:

  • Add a 64-bit HashBytes fast-path in utilcode.h that consumes data in 8-byte chunks before hashing the remaining tail bytes.
  • Remove the unused EEUnicodeHashTableHelper / EEUnicodeStringHashTable implementation and related dead code.
  • Remove unused string hashing helpers / traits and update a comment that referenced removed code.
Show a summary per file
File Description
src/coreclr/vm/eehash.h Removes unused Unicode hash table helper declaration/typedef.
src/coreclr/vm/eehash.cpp Removes unused Unicode hash table helper implementation; keeps string literal helper.
src/coreclr/vm/dynamicmethod.cpp Updates an in-code comment to no longer reference removed helper.
src/coreclr/inc/utilcode.h Adds 64-bit chunking fast-path to HashBytes; removes unused string-hash helpers.
src/coreclr/inc/shash.h Removes unused case-insensitive string compare/hash traits wrapper.

Copilot's findings

  • Files reviewed: 5/5 changed files
  • Comments generated: 1

Comment thread src/coreclr/inc/utilcode.h Outdated
Comment thread src/coreclr/vm/eehash.cpp
AaronRobinsonMSFT and others added 2 commits June 19, 2026 15:21
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Rewrite HashBytes to use xxHash32 QueueRound/MixFinal primitives instead
of byte-at-a-time DJB2. Extract xxHash32 code from typehashingalgorithms.h
into a shared inc/dn_xxhash.h header.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 19, 2026 23:45

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 9/9 changed files
  • Comments generated: 3

Comment thread src/coreclr/inc/dn_xxhash.h
Comment thread src/coreclr/inc/dn_xxhash.h
Comment thread src/coreclr/inc/dn_xxhash.h
- Include clrtypes.h for UINT32/DWORD/UINT_PTR/_rotl/HOST_64BIT
- Use UINT_PTR cast in MixPointerIntoHash for consistency
- Fix typo: mixin -> mix in

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

inline static UINT32 XXHash32_MixEmptyState()
{
// Unlike System.HashCode, these hash values are required to be stable, so don't

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear why these hash values are required to be stable now that this is standalone and no longer part of typehashingalgorithms.h

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

4 participants