Skip to content

lovasoa/tinyzip

Repository files navigation

tinyzip

crates.io docs.rs

tinyzip is a no_std low level ZIP navigation library for Rust. It does not have any dependency and does not allocate memory.

This crate does not decompress data: you iterate over files in a ZIP archive, and get access to raw bytes. You can decompress them with an external crate like miniz_oxide or flate2.

About the ZIP format

A ZIP archive has the following overall structure:

[local file header 1] [file data 1]
[local file header 2] [file data 2]
...
[central directory header 1]
[central directory header 2]
...
[end of central directory record]

Central directory vs. local headers

Each file's metadata is stored twice: once in a local file header immediately before the file data, and once in the central directory near the end of the archive. The central directory is the authoritative source. It contains the full metadata and a pointer (byte offset) to each local header.

This crate reads the central directory. It uses local headers only to resolve the exact byte offset of file data (since the local header contains variable-length fields that can shift the data start). You should not rely on local header fields directly because some writers zero them out.

File name and path encoding

File names are represented as raw bytes. The ZIP specification originally required IBM Code Page 437 encoding, but most archivers today write utf8 or whatever the local OS encoding is.

If general purpose bit 11 (the "Language Encoding Flag", EFS) is set, the file name and comment are guaranteed to be UTF-8. You can check this with Entry::path_is_utf8().

Path separators are always forward slashes (/). Directory entries are indicated by a trailing /. There is no leading slash and no drive letter.

Important notes

  • The compression method can be Stored (no compression) or Deflate (by far the most common). Other values are rare and not supported by this crate.
  • File order in the archive is arbitrary.
  • For files larger than ~4 GB, ZIP64 extensions are used. This crate handles ZIP64 transparently and exposes all integers as u64.

The full format specification is APPNOTE.TXT, maintained by PKWARE.

Supported

  • Single-disk ZIP and ZIP64 archives
  • Leading prefix data and trailing junk
  • Central-directory iteration without buffering the directory
  • Lazy reading of variable-length metadata and local headers

Not Supported

  • Multi-disk ZIP archives
  • Decompression (use the deflate implementation of your choice)
  • Filename decoding: you can access the raw bytes and whether the file name is utf8 (it usually is).
  • Central-directory encryption or compressed central-directory structures
  • Automatic checksum verification (you get access to the checksum if you need it)

Core API

no_std

# fn main() {
#     let file_bytes: &[u8] = include_bytes!("tests/data/manual/go-archive-zip/test.zip");
#     run(file_bytes).unwrap();
# }
# fn run(file_bytes: &[u8]) -> Result<(), tinyzip::Error<tinyzip::SliceReaderError>> {
use tinyzip::{Archive, Compression};
use miniz_oxide::inflate::stream::{inflate, InflateState};
use miniz_oxide::{DataFormat, MZFlush};

let archive = Archive::open(file_bytes)?;
let entry = archive.find_file(b"test.txt")?;
let mut decompressed = [0u8; 1024];
let contents = match entry.compression()? {
    Compression::Deflated => {
        let mut chunks = entry.read_chunks::<512>()?;
        let mut state = InflateState::new(DataFormat::Raw);
        let mut out_pos = 0;
        while let Some(chunk) = chunks.next() {
            let result = inflate(&mut state, chunk?,
                &mut decompressed[out_pos..], MZFlush::None);
            out_pos += result.bytes_written;
        }
        &decompressed[..out_pos]
    }
    Compression::Stored => { entry.read_to_slice(&mut decompressed)? }
};
assert_eq!(contents, b"This is a test text file.\n");
# Ok(())
# }

std feature

When std is available, this crate unlocks features that require std traits or heap allocation. The core logic remains the same and does not allocate when opening a file or iterating through contents.

# fn main() -> Result<(), Box<dyn core::error::Error>> {
# #[cfg(feature = "std")] { // this test requires std
# let zip_path = "tests/data/manual/go-archive-zip/test.zip";
use std::fs::File;
use std::io::{self, Read};
use tinyzip::{Archive, Compression};
use flate2::read::DeflateDecoder; // switch decompressor lib with crate features

let zip_file = File::open(zip_path)?;
let archive = Archive::try_from(zip_file)?;
let entry = archive.find_file(b"test.txt")?;
let mut writer = Vec::new(); // This could be be a `std::fs::File`
let size = entry.uncompressed_size();
assert!(size < 1024, "file too large"); // be careful with zip bombs
match entry.compression()? {
    Compression::Deflated => {
        let mut decoder = DeflateDecoder::new(entry.reader()?).take(size);
        io::copy(&mut decoder, &mut writer)?;
    }
    Compression::Stored => {
        io::copy(&mut entry.reader()?, &mut writer)?;
    }
}
# assert_eq!(writer, b"This is a test text file.\n");
# } Ok(()) }

API details

The API stays low-level on purpose:

Reader is a tiny random-access trait that can be implemented directly on top of immutable positioned reads.

Only small fixed-size archive metadata are loaded and stored in memory. Variable-length fields are read into caller-provided buffers.

Data location is resolved lazily from the local header only when needed.

Performance

Compared against the zip crate (v8) on equivalent operations (in-memory archive, ~2500 deflate-compressed files, realistic nested paths, including multi-MB binary files).

Operation tinyzip zip Speedup
Find file by name 15 µs 402 µs 26x
Extract a small file 54 µs 443 µs 8.2x
Heap allocations 0 9,862
Peak heap usage 0 B 1.5 MB

tinyzip uses miniz_oxide for decompression in the extract benchmark. Reproduce with cargo bench.

Maintenance

pr welcome

About

read zip files without allocating memory in rust

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages