1. Three basic ZIP file operation axe
1.1 Creating a compressed package
Use the ZipFile class to quickly create ZIP files, supporting recursive compression of files and directories:
import zipfile import os def create_zip(output_path, source_dir): with (output_path, 'w', zipfile.ZIP_DEFLATED) as zipf: for root, dirs, files in (source_dir): for file in files: file_path = (root, file) arcname = (file_path, source_dir) (file_path, arcname)
Key parameter description:
- mode='w': write mode ('r' read/'a' append)
- compression=ZIP_DEFLATED: Enable DEFLATE compression algorithm
- arcname: Controls the storage path of files in ZIP
1.2 Decompression operation
The decompression operation is also simple and efficient, supporting complete decompression and selective decompression:
def extract_zip(zip_path, extract_dir): with (zip_path, 'r') as zipf: (extract_dir) # Complete decompression # Example: Unzip a specific file # ('docs/', extract_dir)
1.3 File traversal and information acquisition
The compressed package content can be obtained through the namelist() and infolist() methods:
def inspect_zip(zip_path): with (zip_path, 'r') as zipf: for info in (): print(f"Name: {}") print(f"Size: {info.file_size} bytes") print(f"Compressed: {info.compress_size} bytes") print(f"Modified: {info.date_time}") print("-" * 30)
2. Advanced skills: Make compression smarter
2.1 Encryption and compression practice
To achieve password protection, you need to combine the setpassword method (note: ZIP encryption strength is limited, it is recommended to use 7z formats such as important data):
def create_encrypted_zip(output_path, source_dir, password): with (output_path, 'w', zipfile.ZIP_DEFLATED) as zipf: (('utf-8')) # Add file... # You need to use: # (name, pwd=())
2.2 Incremental update strategy
Implement incremental update through the arcname parameter of the write method:
def update_zip(zip_path, new_file): with (zip_path, 'a') as zipf: (new_file, arcname=(new_file))
2.3 Performance optimization skills
- Large file processing: Use ZIP_STORED storage mode to avoid memory overflow
- Multithreaded compression: combined to implement parallel processing
- Memory mapping: Use BytesIO to process ZIP data in memory
3. Advanced scenario solutions
3.1 Volume compression implementation
Although zipfile does not directly support volume division, it can be implemented by splitting files:
def split_zip(source_path, output_prefix, chunk_size=100*1024*1024): # Create the main compressed package main_zip = f"{output_prefix}.zip" with (main_zip, 'w') as zipf: (source_path, arcname=(source_path)) # Split file (pseudo-code, actual split logic is required) # split_file(main_zip, chunk_size, output_prefix)
3.2 Cross-platform path processing
Use the pathlib library to handle path differences:
from pathlib import Path def normalize_path(path): return str(Path(path).resolve())
3.3 Exception handling best practices
try: with ('', 'r') as z: ('/protected/path') except : print("Error: File is corrupted") except RuntimeError as e: if "Password required" in str(e): print("Error: Password required") except PermissionError: print("Error: No write permission")
4. Performance comparison and selection suggestions
Performance comparison of different compression modes (test data: 100MB text file):
model | Compression rate | Compression time | Memory usage |
---|---|---|---|
ZIP_STORED | 100% | 0.2s | 50MB |
ZIP_DEFLATED | 35% | 2.1s | 150MB |
ZIP_BZIP2 | 30% | 5.8s | 200MB |
ZIP_LZMA | 28% | 12.3s | 300MB |
Selection suggestions:
- Priority to ZIP_DEFLATED balance performance and compression rate
- It is recommended to use ZIP_STORED to avoid memory overflow in super large files
- Select ZIP_BZIP2 when higher compression rates are required
5. Future trends and alternatives
Although zipfile is powerful, other solutions are recommended in the following scenarios:
- Super large data set: consider the tarfile+gzip combination
- Enterprise-level encryption requirements: Use py7zr to process 7z format
- Distributed compression: parallel processing combined with dask
Python's ZIP processing capabilities are fully demonstrated through the zipfile module. From basic file packaging to encryption and compression, to incremental updates and other advanced functions, developers can use concise code to achieve complex compression requirements. After understanding these core modes, it is recommended to further explore pathlib's path processing, shutil's archive operation and other extension functions to build a more robust file processing system. In the era of cloud computing, mastering these basic file operation skills will lay a solid technical foundation for processing massive data.
The above is the detailed explanation of Python ZIP file operation skills. For more information about Python ZIP file operation, please follow my other related articles!