I wanna use zstd to compress all my log files -- as they generate -- in realtime,
this way disk space is saved while potentially reducing io(also when reading/transferring).
traditional approaches write to raw uncompressed log files, and only compress them hourly/daily.
I wanna do that in realtime. [app] -> [log stream] -> zstd -> [disk]
Ideal Approach:
zstd provides a way to tell how many uncompressed bytes are flushed to output
in buffered mode(that automatically manages outputs), are guaranteed can be decompressed
even if they're only partial frame/blocks. maybe adding a common ZSTD_CCtx_stat(...) function.
size_t ZSTD_CCtx_stat(ZSTD_CCtx ctx,int version,struct*stat)
struct stat{
size_t srcInputBytes;
// bytes that guaranteed can be decompressed(already sent to output buffer)
size_t srcFlushedBytes;
// other future stat fields...
size_t dstFrames; // example
size_t dstBlocks; // ...
}
srcInputBytes >= srcFlushedBytes
Approaches investigated(none of them suitable):
- just do streaming compression. this leaves some data in zstd's internal buffer,
meaning that the app writes the log but some of the(last) log data is not persisted to disk.
- call
ZSTD_flushStream on every log line. this messes up the compression ratio, esp. for short log lines.
one can of course buffer more data before calling ZSTD_flushStream, but this still forces flushes
and can messes up zstd's internal buffering strategy, worsening the compression ratio.
Buffer-less streaming compression (synchronous mode),seems too complicated for this presumably
common use-case, and it seems no guarantee that the data can be decompressed(recovered) if system crashes,
as synchronous mode don't write complete frame/blocks, the decompression may stop on partial frame/blocks.
I wanna use zstd to compress all my log files -- as they generate -- in realtime,
this way disk space is saved while potentially reducing io(also when reading/transferring).
traditional approaches write to raw uncompressed log files, and only compress them hourly/daily.
I wanna do that in realtime.
[app] -> [log stream] -> zstd -> [disk]Ideal Approach:
zstd provides a way to tell how many uncompressed bytes are flushed to output
in buffered mode(that automatically manages outputs), are guaranteed can be decompressed
even if they're only partial frame/blocks. maybe adding a common
ZSTD_CCtx_stat(...)function.size_t ZSTD_CCtx_stat(ZSTD_CCtx ctx,int version,struct*stat)srcInputBytes >= srcFlushedBytesApproaches investigated(none of them suitable):
meaning that the app writes the log but some of the(last) log data is not persisted to disk.
ZSTD_flushStreamon every log line. this messes up the compression ratio, esp. for short log lines.one can of course buffer more data before calling
ZSTD_flushStream, but this still forces flushesand can messes up zstd's internal buffering strategy, worsening the compression ratio.
Buffer-less streaming compression (synchronous mode),seems too complicated for this presumablycommon use-case, and it seems no guarantee that the data can be decompressed(recovered) if system crashes,
as synchronous mode don't write complete frame/blocks, the decompression may stop on partial frame/blocks.