Skip to content

CSHARP-6037: Replace SharpCompress with DeflateStream for zlib compression#1997

Draft
damieng wants to merge 10 commits into
mongodb:mainfrom
damieng:CSHARP-6037
Draft

CSHARP-6037: Replace SharpCompress with DeflateStream for zlib compression#1997
damieng wants to merge 10 commits into
mongodb:mainfrom
damieng:CSHARP-6037

Conversation

@damieng
Copy link
Copy Markdown
Contributor

@damieng damieng commented May 11, 2026

Summary

Removes the SharpCompress 0.30.1 dependency from MongoDB.Driver. This package has a known moderate severity vulnerability (GHSA-6c8g-7p36-r338) and the fixed version drops .NET targets we support.

Zlib compression support is replaced with a thin ZlibStream wrapper around System.IO.Compression.DeflateStream, implementing the same RFC 1950 framing (2-byte header + deflate payload + Adler-32 checksum) that SharpCompress provided.

Header validation and Adler-32 checksum verification on decompress are preserved.

Compression level mapping

System.IO.Compression does not expose zlib's full 0–9 integer level range. The mapping is:

User-configured level Behaviour
0 (no compression) Exact
-1 / 6 (default / optimal) Exact
1 (fastest) Exact
2–3 Mapped to level 1 (marginally less compression than requested)
4–5, 7–8 Mapped to level 6 (within one or two levels of requested)
9 (maximum compression) Mapped to level 6 — users who explicitly set level 9 to maximise compression ratio will see a reduction in compression effectiveness

Levels 2–8 are rarely configured explicitly in practice. Level 9 is the most notable change. All output remains valid RFC 1950 zlib and interoperates correctly with the server.

Approach

This is preferrable to updating SharpCompress as we use a tiny fraction of its functionality but are flagged when a vulnerability appears even in functionality we don't use such as this latest one in archive reading. The fix is in a version that has dropped support for .NET targets we still support in this 3.x release.

Given how trivial the actual functionality we require is it makes no sense to continue with this dependency for the sake of adding a 2-byte header and Adler32 checksum over a DeflateStream.

Superceeds #1996

…ib compression

Removes the SharpCompress 0.30.1 dependency (which has a known moderate
vulnerability and dropped .NET platform targets we support) by implementing
the RFC 1950 zlib framing using System.IO.Compression.DeflateStream via a
new ZlibStream wrapper class. DeflateStream uses the same native zlib backend
as SharpCompress with no performance regression.

Header validation and Adler-32 verification on decompress are preserved.
@damieng damieng added bug Fixes issues or unintended behavior. improvement Optimizations or refactoring (no new features or fixes). labels May 11, 2026
@damieng damieng changed the title CSHARP-6037: Replace SharpCompress with built-in DeflateStream for zlib compression CSHARP-6037: Replace SharpCompress with DeflateStream for zlib compression May 11, 2026
@PhilCoultard
Copy link
Copy Markdown

Thanks for addressing this. I was about to report the vulnerability with SharpCompress but great to see you have it in hand already. Noted about the backwards compatibility requirement.

@tebeco
Copy link
Copy Markdown

tebeco commented May 12, 2026

hello @damieng

just to be sure which PR to follow.
I have the feeling that this PR would be more beneficial in the long term than just bumping SharpCompress

i also see that it's still in draft.
Is there any idea which of the two or both is currently concidered ?

or if Mongodb.Driver can confirm or not what's being written in the other PR ?
(someone mentionned the CVE does affect portion of used code and so could be dismissed until it's addressed)

Comment thread src/MongoDB.Driver/Core/Compression/ZlibStream.cs Outdated
return decompressed;
}

private static uint UpdateAdler32(uint adler, byte[] buf, int offset, int count)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need unit tests for this method.

Copy link
Copy Markdown
Contributor Author

@damieng damieng May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Adler values are covered in the zlib tests. If we want to indepdentely validate the Adelr32 we'll need to make this internal or public.

private static MemoryStream ReadAndDecompress(Stream input)
{
byte[] compressed;
using (var ms = new MemoryStream())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this copy of steam? It seems like we can combine this one with MemoryStream instantiated on line 116.

Copy link
Copy Markdown

@tebeco tebeco May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of this needs to be tested on a real aspnetcore app such as a WebApplicationFactory where aspnetcore will throw at any attempt to use SyncIO for stream
this method isn't async i suspect it won't work

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The limitation with WebApplicationFactory and async specifically relates to ASP.NET Request and Response. If we were shipping a general-purpose public ZlibStream that would definitely be a scenario we'd want to support.

In our internal case we only run this on ByteBufferStream, an in-memory buffer that gets populated by NetworkStream (which supports sync and async) on the wire side, and is then consumed by
CompressedMessageBinaryEncoder (which is already only sync today the same as all our IMessageEncoder implementations).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx for clarifying

Comment thread src/MongoDB.Driver/Core/Compression/ZlibStream.cs Outdated
Comment thread src/MongoDB.Driver/Core/Compression/ZlibStream.cs Outdated
_stream.WriteByte((byte)(_adler32 >> 24));
_stream.WriteByte((byte)(_adler32 >> 16));
_stream.WriteByte((byte)(_adler32 >> 8));
_stream.WriteByte((byte)_adler32);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say it's surprising to write something to the stream in dispose. I would prefer to have this code in Close method.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you probably meant CloseAsync or even FlushAsync if this needs to write stuff ?

Copy link
Copy Markdown
Member

@sanych-sun sanych-sun May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Close for sync code-path and CloseAsync for async code-path. Flush/FlushAsync might be used to clean the buffer for some reasons, but more data could be expected.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing the trailer on Dispose matches the BCL pattern (DeflateStream, GZipStream, CryptoStream all do the same). DeflateStream's Dispose calls PurgeBuffers.

Overriding Close() goes against Microsoft's guidance which recommends instead putting it in dispose.

// Replacement for SharpCompress.Compressors.Deflate.ZlibStream.
// Implements RFC 1950 (zlib framing) using System.IO.Compression.DeflateStream for the
// inner RFC 1951 (deflate) payload, with manual Adler-32 checksum handling.
internal sealed class ZlibStream : Stream
Copy link
Copy Markdown
Member

@sanych-sun sanych-sun May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For modern targets we have built-in ZLibStream. Should we use it and keep this implementation for net472 only?

Copy link
Copy Markdown
Contributor Author

@damieng damieng May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should do that in a subsequent update otherwise we have double the surface area changing with two implementations potentially in a patch release. If/when we can drop 4.7.2 I'd like it if we can go full-native with the .NET 6 one and drop this work here entirely.

Copy link
Copy Markdown

@tebeco tebeco May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If/when we can drop 4.7.2 I'd like it if we can go full-native with the .NET 6 one and drop this work here entirely.

you can already go native for anything above net6.0

https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/preprocessor-directives#conditional-compilation

#if !NET6_0_OR_GREATER

// ENTIRE FILE

#endif

I feel like this must 100% what should be done
you would also

#if NET6_0_OR_GREATER
using System.IO.Compression
#else
using The.Other.Namespace
#endif

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it's becomming urgent to also stop targetting net6.0 and start targetting at least 10.0 as mentionned previously

as for support for netfx, that would be a breaking change

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if mongo starts targeting net8.0, net9.0, net10.0 explicitly
then you can release a nuget package for these TFM by removing SharpCompress there
and I wonder if consumer on net10.0 would still have the CVE warning.

In the meanwhile another PR could deal with supporting ZLib for Netfx

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty sure it's probably the best alternative to rely on native when native is supported, and use #if for legacy

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll have to leave the targetting conversations for a different place - that's not something that can be decided in a security patch release.

There is some benefit to providing .NET 6 or greater conditionals and using the built-in. I didn't initially want double the surface area but have come around enough to try it in another commit.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool to see the native code being used for existing modern TFM, thx a lot

(I wonder if net10.0 resolve to ns2.0 or net6.0)

@tebeco
Copy link
Copy Markdown

tebeco commented May 13, 2026

@damieng / @sanych-sun

can we please concider targeting explicitly net10.0 ?

and start looking at net11.0 explicit tfm ?

https://github.com/dotnet/core/blob/main/release-notes/11.0/preview/preview4/libraries.md#span-based-deflate-zlib-and-gzip-encoderdecoder-apis

there's many changes that adds more support for Zlib so it would be cool to know how to anticipate changes ahead of time to avoid possible breaking changes that api would force/offer

its 6 months ahead

@tebeco
Copy link
Copy Markdown

tebeco commented May 13, 2026

in a more personnal comment
i could only suggest you drop all tfm except
netstandard2.0
netstandard2.1 (only if you already explicitly support it)
net8.0 (until this november)
net10.0

netstandard2.1 imply runtime that would both support it and that are not already "modern dotnet"

damieng added 7 commits May 19, 2026 09:56
ZlibStream now defers all I/O until the first Read/Write — the constructor
no longer touches the underlying stream. Decompression streams output
through DeflateStream incrementally instead of buffering the full
decompressed payload into a MemoryStream, lowering peak memory for large
messages. Adler-32 is accumulated as bytes are produced and validated when
DeflateStream signals EOF.

Also adds explicit rejection of zlib preset dictionaries (FDICT) with a
clear error rather than letting the DICTID bytes be misinterpreted as
deflate data, plus a test covering that case.
- ZlibCompressor: map zlibCompressionLevel 7-9 to CompressionLevel.SmallestSize
  on net6.0+ (zlib level 9) to preserve high-compression user intent. Older
  TFMs lack SmallestSize and continue to fall back to Optimal.
- ZlibStream.UpdateAdler32: switch to the standard zlib NMAX (5552) batched-
  modulo form. Roughly halves the CPU cost vs the per-byte modulo loop,
  restoring parity with SharpCompress's Adler32 on the hot path.
- ZlibStream.Dispose: null-guard _compressDeflate so a failed init (e.g.
  header write threw on a broken underlying stream) doesn't surface an NRE
  on top of the original exception.
- ZlibStream.EnsureDecompressInitialized: raise the minimum length floor
  from 7 to 8 — RFC 1951's shortest deflate payload (an empty stored block)
  is "03 00", two bytes.
- CompressorsTests: annotate Zlib_should_generate_expected_compressed_bytes
  explaining why the expected bytes differ from the pre-CSHARP-6037
  baseline (SharpCompress's mid-stream sync-flush vs DeflateStream's single
  final block) and pointing to the inbound-compat test.
The new UpdateAdler32 batches its deferred modulo every NMAX (5552) bytes,
but the existing __testMessage is only ~110 bytes long so no test exercised
the chunk-boundary path. Add a round-trip over a >16KB payload with varied
byte values that straddles three NMAX windows.
…gaps

Adds three tests that exercise paths the existing suite missed:

- Zlib_roundtrip_should_handle_empty_input: round-trip with zero bytes.
- Zlib_decompress_read_should_handle_nonzero_offset_and_partial_reads:
  drives ZlibStream.Read directly with offset>0 and small counts to verify
  the offset is propagated correctly into UpdateAdler32, and that calling
  Read past EOF returns 0 without re-validating the trailer.
- Zlib_compress_level_zero_should_emit_no_compression_header: pins the
  FLEVEL=0 header bytes (0x78 0x01) for level 0 so the header constant
  can't silently swap with the FLEVEL=2 default (both round-trip fine).

The empty-input test surfaced a real bug: System.IO.Compression.DeflateStream
emits zero bytes for empty input (SharpCompress emitted "03 00"), so our
compress path was producing a 6-byte header+Adler-only stream with no
deflate payload — malformed per RFC 1951 and rejected by our own decompress
with the length floor raised to 8. Fix in ZlibStream.Dispose: when no Write
ever happened, emit "03 00" (empty final deflate block) before the Adler-32
trailer to produce a well-formed RFC 1950 empty zlib stream. Split lazy
init into EnsureHeaderWritten (always needed at dispose) and the existing
EnsureCompressInitialized (only when there's actual data to deflate).
return n;
}

public override void Write(byte[] buffer, int offset, int count)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we intentionally support only sync code path, should we override async methods and throw there?

Copy link
Copy Markdown

@tebeco tebeco May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would be possible to have async path and would be used in IAsyncDisposable

assuming we want DisposeAsync, but I can't really tell as i'm not knowing in depth this repo.
It could possibly be a small performance boost

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing in the driver would be calling the async methods - we would need to change a lot more than just the zlibstream to light them up. Might be worth it at some point but well beyond the scope of this piece of work.

Given this is not a general-purpose Zlib implementation but rather something we only use internally in a very specific path throwing on the Async doesn't get us anything. We completely control how this is used and there is no async path into it.

{
// RFC 1950 headers: CMF=0x78 (deflate, 32K window) + FLG satisfying (CMF*256+FLG)%31==0.
// FLEVEL bits in FLG are informational only (RFC 1950 §2.2) and do not affect decompression.
private static readonly byte[] s_defaultHeader = { 0x78, 0x9C }; // FLEVEL=2
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: rename to __defaultHeader and __noCompressionHeader

using (var ms = new MemoryStream())
{
_stream.CopyTo(ms);
compressed = ms.ToArray();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understood we need this copy only to limit size of the stream. Right? Taking into account that this stream could be relatively big, can we avoid this copy somehow or at least use some pool? Can we implement something like a size limited stream? The stream which wraps another stream but restrict it's size?

damieng added 2 commits May 19, 2026 19:40
Before this commit, EnsureDecompressInitialized read the entire compressed
payload into a MemoryStream and then copied it again via ToArray() before
DeflateStream produced a single output byte — peak allocation ~2-3N for an
N-byte compressed wire message. This was a real regression versus the pre-PR
SharpCompress-based code, which streamed the zlib stream end-to-end with
~constant working memory.

Introduce a private nested TrailerReservingStream that wraps the input and
always holds back the last 4 bytes (the Adler-32 trailer) while streaming
everything before that to DeflateStream. Lazy: 4-byte priming happens on the
first Read; the held-back trailer is exposed once source EOFs. EnsureDecompress
now reads only the 2-byte zlib header up front, validates it, then constructs
the TrailerReservingStream — no whole-input buffering.

The Read trailer-validation block pulls one extra byte from TrailerReserving
on EOF: DeflateStream can detect BFINAL=1 inside bytes it already buffered
internally and signal EOF without ever asking our source for another byte;
the forced pull makes the source observe EOF so the held-back 4 bytes become
the validated trailer. Any byte produced by that pull means trailing data
after the deflate payload, which we now reject.

Peak input-side allocation is now ~8KB reusable workspace + 4-byte trailer,
independent of message size — matching the pre-PR SharpCompress profile.

Tests:
- Zlib_decompress_should_handle_one_byte_at_a_time_input: round-trips through
  a source that returns one byte per Read, exercising the n<=4 rotation
  branch of TrailerReservingStream under pathological fragmentation.
- Zlib_decompress_should_stream_without_buffering_entire_input: 1MB round-trip
  to confirm streaming works at realistic wire-message scale.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Fixes issues or unintended behavior. improvement Optimizations or refactoring (no new features or fixes).

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants