Add Roaring Bitmap support (issue #1270)#1741
Add Roaring Bitmap support (issue #1270)#1741crprashant wants to merge 1 commit intomicrosoft:devfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a new host-level Roaring Bitmap custom object extension to Garnet, including a compressed bitmap data structure, a RoaringBitmapObject wrapper, and four new R.* RESP commands, along with docs and tests.
Changes:
- Introduces a pure C# Roaring Bitmap implementation with array/bitmap containers and versioned serialization.
- Adds a Garnet custom object + RESP command implementations for
R.SETBIT,R.GETBIT,R.BITCOUNT, andR.BITPOS, and registers them in the default server host. - Adds end-to-end RESP tests and data-structure unit tests, plus documentation for the new commands.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| website/docs/commands/roaring-bitmap.md | Adds user-facing documentation for the new Roaring Bitmap object and R.* commands. |
| test/Garnet.test/RoaringBitmapDataTests.cs | Adds unit tests for the standalone RoaringBitmap data structure (promotion/demotion, bitpos, serialization). |
| test/Garnet.test/RespRoaringBitmapTests.cs | Adds RESP-level integration tests for the new commands via StackExchange.Redis. |
| main/GarnetServer/Program.cs | Registers the Roaring Bitmap custom type and R.* commands in the default host. |
| main/GarnetServer/Extensions/RoaringBitmap/RoaringBitmapObject.cs | Implements the Garnet custom-object wrapper (clone/serialize/size tracking). |
| main/GarnetServer/Extensions/RoaringBitmap/RoaringBitmapCommands.cs | Implements argument parsing and the four RESP commands. |
| main/GarnetServer/Extensions/RoaringBitmap/RoaringBitmap.cs | Implements the roaring bitmap core, bit operations, enumeration, and serialization format. |
| main/GarnetServer/Extensions/RoaringBitmap/Containers/IContainer.cs | Defines the internal container abstraction and serialization kind enum. |
| main/GarnetServer/Extensions/RoaringBitmap/Containers/BitmapContainer.cs | Implements dense bitmap container behavior, popcount, and serialization. |
| main/GarnetServer/Extensions/RoaringBitmap/Containers/ArrayContainer.cs | Implements sparse sorted-array container behavior, promotion logic, and serialization. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // uint32 universe — past the set range any value qualifies, so | ||
| // accept any value >= from that's not in the bits array. | ||
| ClassicAssert.GreaterOrEqual(actual0, from); | ||
| ClassicAssert.IsTrue(actual0 == 0 || (actual0 < bits.LongLength ? !bits[actual0] : true)); |
There was a problem hiding this comment.
bits[actual0] will not compile because array indices must be int but actual0 is a long. Cast to int after checking bounds (or keep actual0 as an int for this bounded-universe test).
| ClassicAssert.IsTrue(actual0 == 0 || (actual0 < bits.LongLength ? !bits[actual0] : true)); | |
| ClassicAssert.IsTrue(actual0 == 0 || (actual0 >= 0 && actual0 < bits.LongLength ? !bits[(int)actual0] : true)); |
| /// <summary>Estimated heap byte cost. Excludes the .NET object header overhead and the SortedDictionary node overhead.</summary> | ||
| public long ByteSize | ||
| { | ||
| get | ||
| { | ||
| long sum = 24; // base object overhead estimate | ||
| foreach (var kv in chunks) | ||
| { | ||
| // Per-entry: key (2B), reference (8B), red-black tree node overhead (~40B), and container. |
There was a problem hiding this comment.
The ByteSize comment says it excludes SortedDictionary node overhead, but the implementation adds an estimate per entry (sum += 50 + kv.Value.ByteSize, including an RB-tree node estimate). Update the comment (or the calculation) so documentation matches what is reported/used for size accounting.
| /// <summary>Estimated heap byte cost. Excludes the .NET object header overhead and the SortedDictionary node overhead.</summary> | |
| public long ByteSize | |
| { | |
| get | |
| { | |
| long sum = 24; // base object overhead estimate | |
| foreach (var kv in chunks) | |
| { | |
| // Per-entry: key (2B), reference (8B), red-black tree node overhead (~40B), and container. | |
| /// <summary>Estimated heap byte cost, including an approximate base-object cost and approximate per-entry <see cref="SortedDictionary{TKey, TValue}"/> node overhead.</summary> | |
| public long ByteSize | |
| { | |
| get | |
| { | |
| long sum = 24; // Approximate RoaringBitmap instance/base-object cost. | |
| foreach (var kv in chunks) | |
| { | |
| // Per-entry estimate: key (2B), reference (8B), red-black tree node overhead (~40B), and container. |
| public override bool NeedInitialUpdate(ReadOnlyMemory<byte> key, ref ObjectInput input, ref RespMemoryWriter writer) => true; | ||
|
|
There was a problem hiding this comment.
RSetBit.NeedInitialUpdate always returns true. For a missing key, this means the object is created before Updater runs; if parsing fails, AbortWithErrorMessage still returns true, so the command errors but the empty key can remain created/size-tracked. Validate args in NeedInitialUpdate and return false (after writer.WriteError(...)) on bad input to avoid creating keys on invalid commands.
| public override bool NeedInitialUpdate(ReadOnlyMemory<byte> key, ref ObjectInput input, ref RespMemoryWriter writer) => true; | |
| public override bool NeedInitialUpdate(ReadOnlyMemory<byte> key, ref ObjectInput input, ref RespMemoryWriter writer) | |
| { | |
| var validationInput = input; | |
| int offset = 0; | |
| var offsetArg = GetNextArg(ref validationInput, ref offset); | |
| var bitArg = GetNextArg(ref validationInput, ref offset); | |
| if (!RoaringBitmapArgs.TryParseUInt32(offsetArg, out _)) | |
| { | |
| writer.WriteError(ErrOffset); | |
| return false; | |
| } | |
| if (!RoaringBitmapArgs.TryParseBit(bitArg, out _)) | |
| { | |
| writer.WriteError(ErrValue); | |
| return false; | |
| } | |
| return true; | |
| } |
|
Thanks for the thorough review! Pushed
All 27 data-structure tests + 14 RESP integration tests still pass: |
|
@microsoft-github-policy-service agree company="Microsoft" |
2 similar comments
|
@microsoft-github-policy-service agree company="Microsoft" |
|
@microsoft-github-policy-service agree company="Microsoft" |
|
Thanks for your contribution! This extension is interesting, but instead of putting it in main, we should place it in https://github.com/microsoft/garnet/tree/main/modules (where e.g., GarnetJSON is kept) so that it is not bundled by default in the server. |
|
Also, main is closed to new features, so we would request that you retarget your PR to the dev (v2) branch. |
02743e5 to
66af335
Compare
|
Thanks for the review! Both points addressed in the latest force-push:
Local validation: all 43 RoaringBitmap tests (29 data + 14 RESP) pass on net8.0, |
Addresses PR review feedback from @badrishc: - Move the extension from main/GarnetServer/Extensions/RoaringBitmap to modules/RoaringBitmap so it isn't bundled by default (mirrors GarnetJSON). - Retarget the PR to dev (companion change). Implementation changes for the move: - New modules/RoaringBitmap/GarnetRoaringBitmap.csproj (mirrors GarnetJSON.csproj, signs assembly, exposes InternalsVisibleTo Garnet.test). - New RoaringBitmapModule : ModuleBase entry point that registers the factory and the four R.SETBIT/R.GETBIT/R.BITCOUNT/R.BITPOS commands. - Renamed namespace Garnet.Extensions.RoaringBitmap -> GarnetRoaringBitmap to avoid the namespace/class collision with class RoaringBitmap. - Updated CustomObjectFunctions overrides to dev-branch scoped ReadOnlySpan<byte> signatures for NeedInitialUpdate / Updater. - Updated RoaringBitmapObject to dev-branch CustomObjectBase ctor and HeapMemorySize accounting. - Wired the module into Garnet.slnx and Garnet.test.csproj. - Tests still register via server.Register.NewCommand in [SetUp] (in-process), matching the existing custom-object test pattern. - Updated StringKeyAndCustomObjectKey_AreSeparate to expect WRONGTYPE on the unified store on dev.
66af335 to
dd3e2da
Compare
|
|
||
| <ItemGroup> | ||
| <ProjectReference Include="..\..\libs\server\Garnet.server.csproj" /> | ||
| </ItemGroup> |
There was a problem hiding this comment.
Let's also include our threading analyzers for consistency:
<PackageReference Include="Microsoft.VisualStudio.Threading.Analyzers" PrivateAssets="all" IncludeAssets="analyzers"/>
| /// ascending order by high-key for deterministic serialization and efficient | ||
| /// scans (e.g., bit-position queries). | ||
| /// | ||
| /// This class is NOT thread-safe; the parent RoaringBitmapObject (added in a |
| /// <summary> | ||
| /// Sets bit <paramref name="value"/> to 1. Returns the previous value (0 or 1). | ||
| /// </summary> | ||
| public int Add(uint value) |
There was a problem hiding this comment.
Strange to use an int for this, I'd switch to bool.
| /// <summary> | ||
| /// Clears bit <paramref name="value"/>. Returns the previous value (0 or 1). | ||
| /// </summary> | ||
| public int Remove(uint value) |
There was a problem hiding this comment.
Same note, prefer bool.
| } | ||
|
|
||
| /// <summary>Convenience wrapper used by RESP SETBIT: dispatches to <see cref="Add"/> or <see cref="Remove"/>.</summary> | ||
| public int SetBit(uint offset, bool set) => set ? Add(offset) : Remove(offset); |
There was a problem hiding this comment.
Same note, prefer bool.
| @@ -0,0 +1,313 @@ | |||
| // Copyright (c) Microsoft Corporation. | |||
There was a problem hiding this comment.
I'd would like to see a test that does concurrent reads while writes are in progress - I see a test for concurrent writes which is not quite enough.
| [Test] | ||
| public void EmptyBitmap_HasZeroCardinalityAndIsEmpty() | ||
| { | ||
| var rb = new global::GarnetRoaringBitmap.RoaringBitmap(); |
There was a problem hiding this comment.
nit: remove all this global:: stuff.
| { | ||
| private static ReadOnlySpan<byte> ErrOffset => "ERR bit offset is not an unsigned 32-bit integer"u8; | ||
|
|
||
| public override bool NeedInitialUpdate(scoped ReadOnlySpan<byte> key, ref ObjectInput input, ref RespMemoryWriter writer) |
There was a problem hiding this comment.
This is more a question for @badrishc as I haven't played around with custom objects too much - is validation expected to get in NeedInitialUpdate like this?
It's unfortunate as it forces this read command to act like a write which will hurt throughput.
| /// </summary> | ||
| public sealed class RBitCount : CustomObjectFunctions | ||
| { | ||
| public override bool NeedInitialUpdate(scoped ReadOnlySpan<byte> key, ref ObjectInput input, ref RespMemoryWriter writer) |
There was a problem hiding this comment.
Similar Q for @badrishc (and again in RBitPos) - using NeedInitialUpdate for "missing" is messy; can this be phrased as a Reader op instead?
| remaining fast for membership and population-count queries. | ||
|
|
||
| The extension lives in `main/GarnetServer/Extensions/RoaringBitmap/` and is wired | ||
| into the default `GarnetServer` host. It introduces a new object type and four |
There was a problem hiding this comment.
This is incorrect RoaringBitmaps must be loaded manually (which is correct).
Add Roaring Bitmap support (issue #1270)
Resolves #1270.
Summary
Adds a Roaring Bitmaps extension to Garnet that introduces a new compressed
bitmap object type plus four
R.*RESP commands. Implemented entirely as ahost extension in
main/GarnetServer/Extensions/RoaringBitmap/— zerochanges to
libs/server/— so this is a clean, reviewable foundation thatcan be deepened in follow-up PRs.
Why Roaring?
A naive
uint32bitmap is 512 MiB. Roaring partitions the universe into65 536 chunks of 65 536 bits and represents each chunk as either:
ushort[], used while a chunk holds ≤ 4 096set bits (~
2·countbytes).ulong[1024](8 KiB exactly), used once a chunkexceeds the threshold.
Empty chunks consume zero memory. Chunks promote (array → bitmap) and demote
(bitmap → array) automatically as cardinality changes.
Commands
R.SETBIT key offset valueoffset∈[0, 2³²-1]to0/1. Returns previous bit.R.GETBIT key offsetoffset.0for missing keys.R.BITCOUNT key0for missing keys.R.BITPOS key bit [from]bit(0/1) at or afterfrom.-1if none.Docs:
website/docs/commands/roaring-bitmap.md.Design notes
RoaringBitmap.cs,Containers/*) is a pure C# librarywith no Garnet dependencies → independently unit-testable.
RoaringBitmapObject(CustomObjectBase) wraps the structure and trackssize deltas via
bitmap.ByteSizefor the per-object size accounting.CommandType.ReadModifyWrite. The reads(
R.GETBIT/R.BITCOUNT/R.BITPOS) do not mutate state, but the RMWpath is required so that
NeedInitialUpdateis invoked on missing keys —the framework's
Readpath simply returns nil otherwise. Missing-keyresponses (
0/-1) are written fromNeedInitialUpdatewhich thenreturns
falseto decline key creation.NeedInitialUpdateerror paths usewriter.WriteError(...)+return falserather than
AbortWithErrorMessage(which returnstrueand would cause theframework to proceed into
InitialUpdater/Updater, double-writing theresponse and corrupting the protocol stream).
Tests
RoaringBitmapDataTests(pure data structure)RespRoaringBitmapTests(RESP integration via SE.Redis)Coverage highlights:
4096↔4097) and demotion across both directions.HashSet<uint>oracle.0,65535,65536,2³¹,2³²-1.R.SETBIT/R.GETBITparity with oracle,R.BITCOUNT,R.BITPOS(set / unset /fromoffset), large offsets, persistence acrossrestart, concurrent setbits from multiple clients, error paths
(bad offset, bad bit, bad value, wrong arity), and the two-store key
separation property.
Known limitations (intentional v1 scope)
contiguous ranges; the array/bitmap pair captures the bulk of real-world
savings.
R.BITOP AND/OR/XOR/NOTis not exposed. The data structure supportsthese natively; only command surface is needed.
rather than removing the key. This is a property of the custom-object
framework's tombstone path (
output.HasRemoveKeyis honoured only on thebuilt-in path) and is best fixed in
libs/server/Storage/Functions/ObjectStorein a separate PR.
Files