Skip to content

[feat] Realtime Log Archive Use Case ZSTD_CCtx_stat(...) #4660

@revintec

Description

@revintec

I wanna use zstd to compress all my log files -- as they generate -- in realtime,
this way disk space is saved while potentially reducing io(also when reading/transferring).
traditional approaches write to raw uncompressed log files, and only compress them hourly/daily.
I wanna do that in realtime. [app] -> [log stream] -> zstd -> [disk]

Ideal Approach:

zstd provides a way to tell how many uncompressed bytes are flushed to output
in buffered mode(that automatically manages outputs), are guaranteed can be decompressed
even if they're only partial frame/blocks. maybe adding a common ZSTD_CCtx_stat(...) function.
size_t ZSTD_CCtx_stat(ZSTD_CCtx ctx,int version,struct*stat)

struct stat{
  size_t srcInputBytes;
  // bytes that guaranteed can be decompressed(already sent to output buffer)
  size_t srcFlushedBytes;
  // other future stat fields...
  size_t dstFrames; // example
  size_t dstBlocks; // ...
}

srcInputBytes >= srcFlushedBytes

Approaches investigated(none of them suitable):

  1. just do streaming compression. this leaves some data in zstd's internal buffer,
    meaning that the app writes the log but some of the(last) log data is not persisted to disk.
  2. call ZSTD_flushStream on every log line. this messes up the compression ratio, esp. for short log lines.
    one can of course buffer more data before calling ZSTD_flushStream, but this still forces flushes
    and can messes up zstd's internal buffering strategy, worsening the compression ratio.
  3. Buffer-less streaming compression (synchronous mode),seems too complicated for this presumably
    common use-case, and it seems no guarantee that the data can be decompressed(recovered) if system crashes,
    as synchronous mode don't write complete frame/blocks, the decompression may stop on partial frame/blocks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions