SoFunction
Updated on 2024-11-19

How Python Splits ZIP Files

Python Splitting ZIP Files

A coworker was given an assignment to split and distribute zip files on a weekly basis.

The file looks like this

There are a lot of documents in the catalog that have to be sent down by outlets. This can't be done manually every time. Python scripting is the way to go!

Requires 2 libraries

import os
import zipfile

Unzip to a temporary directory first, then traverse the compression to a new zip file. Watch out for Chinese character problems.

coding

# encoding: utf-8
"""
@author: aged coconut
@contact: hndm@
@version: 1.0
@project:MyTools
@file: zip_work.py
@time: 2021-9-13 15:48
clarification
"""
import os
import zipfile
 
 
def dfs_get_zip_file(input_path,result):
    # Traverse the catalog listings
    files = (input_path)
    for file in files:
        if (input_path+'/'+file):
            dfs_get_zip_file(input_path+'/'+file,result)
        else:
            (input_path+'/'+file)
 
def zip_path(input_path,output_path,output_name,up_path=""):
    # input_path Directory to compress
    # output_path zip file storage directory
    # output_name zip file name
    # up_path zip package need to eliminate the parent directory, to avoid the zip package directory is too deep
    f = (output_path+'/'+output_name,'w',zipfile.ZIP_DEFLATED)
    filelists = []
    dfs_get_zip_file(input_path,filelists)
    for file in filelists:
        (file,(up_path,''))
    ()
    return output_path+r"/"+output_name
 
 
def get_category_dir_zip(filepath, ext_dir, up_path = ""):
    #Iterate through all the files under filepath, including subdirectories , find the dot directory, and compress it into a zip file.
    # Modify the compression logic as needed, in my case, by outlet organization.
    cate_dict = {'469030':'21',
                '469035':'23',
                '469031':'24',
                '469027':'19',
                '469003':'13',
                '469025':'17',
                '469007':'16',
                '460101':'11',
                '469033':'25',
                '469028':'26',
                '469034':'27',
                '469002':'14',
                '469036':'28',
                '460201':'12',
                '469026':'22',
                '469006':'20',
                '469005':'18',
                '469001':'15',
                }
    files = (filepath)
    if (ext_dir):
        pass
    else:
        (ext_dir)
    for fi in files:
        fi_d = (filepath,fi)
        if (fi_d):
            if ("46")==0:
                zip_file_cnt = 0
                ctg_dir_list = (filepath)
                for ci in ctg_dir_list:
                    ctg_dir = (filepath, ci)
                    if (ctg_dir):
                        zip_file = "{}.zip".format(ci[:6])
                        zip_file_dir = (ext_dir, zip_file)
 
                        if (zip_file_dir):  # If the file exists Delete the file
                            (zip_file_dir)
                        print('Compression', ctg_dir, zip_file_dir, ext_dir)
                        zip_path(ctg_dir, ext_dir, zip_file, up_path)
                        zip_file_cnt = zip_file_cnt + 1
                return zip_file_cnt
            else:
                return get_category_dir_zip(fi_d, ext_dir)
 
 
 
 
def sfp_unzip(file_path, ext_dir):
    """unzip zip file"""
    zip_file = (file_path)
    if (ext_dir):
        pass
    else:
        (ext_dir)
    zip_i = 0
 
    for names in zip_file.namelist():
        zip_i = zip_i + 1
        # Avoid garbled Chinese
        gbk_names = ('cp437').decode('gbk')
        file_size = zip_file.getinfo(names).file_size
        new_path = (ext_dir, gbk_names)
        # Determine if a file is a folder or a file
        if file_size > 0:
            # is a file, create the file by open, write the data
            with open(file=new_path, mode='wb') as f:
                # It's reading the contents of a file in a zip archive.
                (zip_file.read(names))
        else:
            # It's a folder, just create
            (new_path)
    zip_file.close()
    return zip_i
 
if __name__=="__main__":
    # Unzip the file
    file_cnt = sfp_unzip("zip/", "D:/zip/tmp")
    if file_cnt > 0:
        # Compress packaged files by outlets
        zip_file_cnt = get_category_dir_zip("D:/zip/tmp", "D:/zip/data")
        print("break up and establish{}classifier for individual things or people, general, catch-all classifierzipfile。".format(zip_file_cnt))
    else:
        print("The zip file is empty, not split to create a zip file.")

Python ZIP Packaging Unpacking

pack

The zip function can "stitch" two lists together, for example:

a=[1,2,3]
b=['x','y','z']
c=list(zip(a,b))
print(c)

The implication here is that the first element in a is paired with the first element in b and put into a tuple; the second element in a is paired with the second element in b and put into another tuple; and so on.

Now, what would be the result of additionally adding an element inside a and then using zip?

(4)
c=list(zip(a,b))
print(c)

As you can see from the above screenshot, the list is passed as a parameter to the zip function, and the zip function traverses the list (or any other iterable data type Iterable data type) until it reaches the shortest list, and then the traversal ends.

In the above example of a and b, the length of list b is 3 and the length of list a is 4, so the final resulting list c is also only 3 in length.

unpack

Unpacking is the directional operation of loading

d=list(zip(*c))
print(d)

The core of the above unpacking is inside zip(*c), which is not so well understood here. The result is two tuples, d and e, whose contents are the same as the list elements a and b, respectively.

Focus here*cIt's not very easy to understand. You can actually print it out:

print(*c)

As you can see from the above figure *c is splitting out each of the three parameters inside the c list and passing them as arguments inside the zip function. Another test can be done to verify this.

p1=(1,'x')
p2=(2,'y')
p3=(3,'z')
p=list(zip(p1,p2,p3))
p==d

From the above validation you can see that the *c when unpacking, in fact, on the c list inside the three as a tuple of elements are passed into the zip function.

The above is a personal experience, I hope it can give you a reference, and I hope you can support me more.