This package extends zipfile with remove-related functionalities.
-
ZipFile.remove(zinfo_or_arcname)Removes a member entry from the archive's central directory. zinfo_or_arcname may be the full path of the member or a
ZipInfoinstance. If multiple members share the same path and a string is provided, only one unspecified entry is removed; pass a specificZipInfoinstance to guarantee which is removed.The archive must be opened with mode
'w','x'or'a'.Returns the removed
ZipInfoinstance.Calling
removeon a closed ZipFile will raise aValueError.Note: This method only removes the member's entry from the central directory, making it inaccessible to most tools. The member's local file entry, including content and metadata, remains in the archive and is still forensically recoverable. To completely delete the data and reclaim space, call
repackafterwards (preferably passing the returnedZipInfoinstance). -
ZipFile.repack(removed=None, *, strict_descriptor=True[, chunk_size])Rewrites the archive to remove unreferenced local file entries, shrinking its file size. The archive must be opened with mode
'a'.If removed is provided, it must be a sequence of
ZipInfoobjects representing the recently removed members, and only their corresponding local file entries will be removed. This is the most efficient and reliable way to reclaim space. For example:with ZipFile('spam.zip', 'a') as myzip: removed = [myzip.remove(name) for name in ('ham.txt', 'eggs.txt')] myzip.repack(removed)
If removed is omitted, the archive is scanned to locate and remove local file entries that are no longer referenced in the central directory.
When scanning, strict_descriptor controls how entries written with an unsigned data descriptor are handled. A data descriptor is an optional record holding an entry's CRC and sizes, stored just after the entry's data; it is used when the archive is written to a non-seekable stream, and is signed when it begins with a marker signature or unsigned otherwise. Unsigned descriptors have been deprecated by the PKZIP Application Note since version 6.3.0 (released in 2006) and are written only by some legacy tools; signed descriptors—written by Python and other modern tools—are always detected. When strict_descriptor is true (the default), unsigned descriptors are not detected, and related unreferenced entries are not removed. Setting
strict_descriptor=Falseadditionally detects unsigned descriptors, at the cost of a significantly slower scan—around 100 to 1000 times in the worst case—which may be exploitable as a denial-of-service vector on untrusted input. This does not affect entries without a data descriptor, and is not needed when removed is provided.chunk_size may be specified to control the buffer size when moving entry data (default is 1 MiB).
Calling
repackon a closed ZipFile will raise aValueError.Note: The scanning algorithm is heuristic-based and assumes that the ZIP file is normally structured—for example, with local file entries stored consecutively, without overlap or interleaved binary data. Prepended binary data, such as a self-extractor stub, is recognized and preserved unless it happens to contain bytes that coincidentally resemble a valid local file entry in multiple respects—an extremely rare case. Embedded ZIP payloads are also handled correctly, as long as they follow normal structure. However, the algorithm does not guarantee correctness or safety on untrusted or intentionally crafted input. It is generally recommended to provide the removed argument for better reliability and performance.
-
ZipFile.copy(zinfo_or_arcname, new_arcname[, chunk_size])Copies a member zinfo_or_arcname to new_arcname in the archive. zinfo_or_arcname may be the full path of the member or a
ZipInfoinstance.chunk_size may be specified to control the buffer size when copying entry data (default is 1 MiB).
The archive must be opened with mode
'w','x'or'a', and the underlying stream must be seekable.Returns the original version of the copied
ZipInfoinstance.Calling
copyon a closed ZipFile will raise aValueError.Note: Renaming a member in a ZIP file requires rewriting its data, as the filename is stored within its local file entry.
To rename a member and reclaim the space occupied by the old entry, combine
copy,remove, andrepacklike:with ZipFile('spam.zip', 'a') as myzip: myzip.repack([myzip.remove(myzip.copy('old.txt', 'new.txt'))])
Call repack after removes to reclaim the space of the removed entries:
import os
import zipremove as zipfile
with zipfile.ZipFile('archive.zip', 'w') as zh:
zh.writestr('file1', 'content1')
zh.writestr('file2', 'content2')
zh.writestr('file3', 'content3')
zh.writestr('file4', 'content4')
print(os.path.getsize('archive.zip')) # 398
with zipfile.ZipFile('archive.zip', 'a') as zh:
zh.remove('file1')
zh.remove('file2')
zh.remove('file3')
zh.repack()
print(os.path.getsize('archive.zip')) # 116 (would be 245 without `repack`)Alternatively, pass the ZipInfo objects of the removed entries, for better performance and error-proofing:
import os
import zipremove as zipfile
with zipfile.ZipFile('archive.zip', 'w') as zh:
zh.writestr('file1', 'content1')
zh.writestr('file2', 'content2')
zh.writestr('file3', 'content3')
zh.writestr('file4', 'content4')
print(os.path.getsize('archive.zip')) # 398
with zipfile.ZipFile('archive.zip', 'a') as zh:
zinfos = []
zinfos.append(zh.remove('file1'))
zinfos.append(zh.remove('file2'))
zinfos.append(zh.remove('file3'))
zh.repack(zinfos)
print(os.path.getsize('archive.zip')) # 116 (would be 245 without `repack`)import os
import zipremove as zipfile
with zipfile.ZipFile('archive.zip', 'w') as zh:
zh.writestr('file0', 'content0')
zh.writestr('folder1/file1', 'content1')
zh.writestr('folder1/file2', 'content2')
zh.writestr('folder1/file3', 'content3')
print(os.path.getsize('archive.zip')) # 446
with zipfile.ZipFile('archive.zip', 'a') as zh:
for n in zh.namelist():
if n.startswith('folder1/'):
n2 = 'folder2/' + n[len('folder1/'):]
zh.copy(n, n2)
zh.remove(n)
zh.repack()
print(os.path.getsize('archive.zip')) # 446 (would be 599 without `repack`)Similarly, pass the ZipInfo objects of the copied/removed entries for better performance and error-proofing:
import os
import zipremove as zipfile
with zipfile.ZipFile('archive.zip', 'w') as zh:
zh.writestr('file0', 'content0')
zh.writestr('folder1/file1', 'content1')
zh.writestr('folder1/file2', 'content2')
zh.writestr('folder1/file3', 'content3')
print(os.path.getsize('archive.zip')) # 446
with zipfile.ZipFile('archive.zip', 'a') as zh:
zinfos = []
for n in zh.namelist():
if n.startswith('folder1/'):
n2 = 'folder2/' + n[len('folder1/'):]
zinfos.append(zh.remove(zh.copy(n, n2)))
zh.repack(zinfos)
print(os.path.getsize('archive.zip')) # 446 (would be 599 without `repack`)