Overhaul set benchmarks: split Immutable / SingleThreaded, add Set.copyOf#11721
Overhaul set benchmarks: split Immutable / SingleThreaded, add Set.copyOf#11721dougqh wants to merge 4 commits into
Conversation
…pyOf Mirror the map-benchmark overhaul for sets. Replace the single SetBenchmark (shared mutable counter under @threads(8); contains_treeSet bug that queried HASH_SET) with two classes that each pick the right threading model: - ImmutableSetBenchmark: fixed read-only membership shared across threads (@State(Scope.Benchmark)); array / sortedArray / HashSet / TreeSet / Set.copyOf (the JDK compact SetN the agent actually uses for config sets, via CollectionUtils.tryMakeImmutableSet). hit/miss split, per-thread cursor. - SingleThreadedSetBenchmark: per-thread mutable lifecycle (@State(Scope.Thread)); create/clone + contains/iterate, plus a Collections.synchronizedSet case for the uncontended synchronization tax (per-thread => bias never revoked; biased-locking story across JVMs). StringIndex rows fold in later. Result blocks empty pending a fresh multi-JVM run. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…edMapBenchmark StringIndex's benchmark integration is moving to the dedicated benchmark PRs (set overhaul #11721, map overhaul #11679) and will be folded in there later. Revert both benchmark files to master so this PR is purely the StringIndex data structure + tests. Avoids the #11679/#11721 deletions-vs-edits conflicts too. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
🟢 Java Benchmark SLOs — All performance SLOs passed
PR vs. master results
Commit: Load and DaCapo benchmarks can be triggered manually in the GitLab pipeline. Results will appear in the Benchmarking Platform UI after completion. |
ImmutableSetBenchmark: HashSet fastest; Set.copyOf (SetN) ~10% behind on hit, the compact form the agent uses for fixed config sets. SingleThreadedSetBenchmark: uncontended synchronizedSet tax ~37% on contains (biased locking off, Java 17), near-zero on iterate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Hi! 👋 Thanks for your pull request! 🎉 To help us review it, please make sure to:
If you need help, please check our contributing guidelines. |
| Arrays.sort(sortedArray); | ||
| hashSet = new HashSet<>(Arrays.asList(STRINGS)); | ||
| treeSet = new TreeSet<>(Arrays.asList(STRINGS)); | ||
| copyOfSet = CollectionUtils.tryMakeImmutableSet(Arrays.asList(STRINGS)); |
There was a problem hiding this comment.
suggestion: Maybe a better name instead of copyOf, e.g. tracerImmutableSet
There was a problem hiding this comment.
Yeah, I debated that. It was so named because of the underlying mechanism, but tracerImmutableSet might be better.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5a14e62e10
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| public Set<String> clone_synchronizedSet() { | ||
| return Collections.synchronizedSet(new HashSet<>(hashSet)); |
There was a problem hiding this comment.
Clone the synchronized set under test
In the clone benchmark for the synchronized-set variant, this copies the plain hashSet instead of the prebuilt synchronizedSet, unlike the other clone benchmarks that copy their corresponding structure. This means the reported clone_synchronizedSet result only measures a HashSet copy plus wrapper allocation, not the cost of cloning the synchronized-set variant that the benchmark name and setup imply.
Useful? React with 👍 / 👎.
Per bric3 review: copyOf named the mechanism; tracerImmutableSet names the role (the agent's fixed-config-set representation, i.e. Set.copyOf / SetN). Prose keeps the Set#copyOf reference for the mechanism. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…n one) Per Codex review: it copied `hashSet`, unlike the other clone_* methods which copy their own structure. Copy `synchronizedSet` so it faithfully measures cloning the synchronized variant. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
What This Does
Overhauls the
internal-apiset membership benchmarks, mirroring the map-benchmark overhaul (#11679). Replaces the singleSetBenchmarkwith two classes that each pick the correct threading model for their use case (@Statescope can't vary by@Param, so one class can't host both):ImmutableSetBenchmark— fixed, read-only membership shared across threads (@State(Scope.Benchmark); sharing is realistic and contention-free since nothing mutates). Comparesarray/sortedArray/HashSet/TreeSet/Set.copyOf(the JDK's compact, array-backedImmutableCollections.SetN, viaCollectionUtils.tryMakeImmutableSet— what the agent actually uses for fixed config sets). hit/miss split, per-thread lookup cursor. Sets in the tracer skew strongly toward this shape.SingleThreadedSetBenchmark— per-thread mutable lifecycle (@State(Scope.Thread)):create/clone+contains/iterate, plus aCollections.synchronizedSetcase for the uncontended synchronization tax (each thread owns its set → monitor only ever locked by one thread → the biased-locking story, read across JVM versions). UnsynchronizedHashSetis the in-harness baseline.Motivation
The old
SetBenchmarkused a shared mutable rotation counter under@Threads(8)(turning fast structures into a contention measurement) and had acontains_treeSetthat actually queriedHASH_SET. The split fixes both, and theSet.copyOfcase answers a real question: for our ~10 fixedstatic final HashSetconfig sites, is the JDK's compact immutable set better thanHashSeton speed/footprint?Additional Notes
Set.copyOfonly materializes the compactSetNon Java 10+ (falls back toHashSetpre-10); the synchronizedSet biased-locking delta shows across Java 11 → 17. Result blocks are intentionally empty pending a fresh multi-JVM run.StringIndex(Add StringIndex: a generic open-addressed string set #11660) rows fold into these later — kept out so this lands independent of that data structure.🤖 Generated with Claude Code