SoFunction
Updated on 2024-11-14

Python makes use of the glob library to achieve an easy response to file and directory management

introductory

In Python programming, we often need to work with files and directories. To handle these tasks more easily, Python provides thegloblibrary, which allows us to match files and directories based on specific patterns. This blog will go into detail about thegloblibrary and demonstrates its various functions with examples.

What is a glob library

globlibrary is a module in the Python standard library that provides a simple and powerful way to match pathnames of files and directories. Typically, we use wildcards to search for files on the command line, for example*.txtIndicates that it matches everything that starts with.txtfor files with a suffix.globlibrary allows us to programmatically perform similar file matching operations in Python scripts.

globThe main methods of the library areglob()function that takes a pattern string as input and returns a list of all files and directories that match that pattern.

Installing the glob library

In most cases, Python comes pre-installed with thegloblibrary, so you don't need to install it additionally. If your Python environment doesn't have thegloblibrary, which can be installed using the following command:

pip install glob2

Now let's start exploringglobVarious functions of the library.

Basic usage

Importing the glob library

in usingglobBefore the library, you first need to import it. In Python, we use theimportstatement to import the module:

import glob

Matching files with wildcards

globThe library uses wildcards to match files and directories. Here are some commonly used wildcards:

  • *: Matches 0 or more characters.
  • ?: Matches a single character.
  • []: Matches characters within a specified range, such as[0-9]Matches all numeric characters.

Let's look at an example. Suppose we have a folderdata, which contains the following files:

data/
    
    
    
    

Now, we want to match all the values that start with.txtfor files with a suffix. We can use the*.txtas the pattern string:

txt_files = ("data/*.txt")
print(txt_files)

Output:

['data/', 'data/']

As we have seen.()function returns a list of all the functions that begin with.txtis the path to the file with the suffix.

Match a specific directory

If we want the matching file to be in a subdirectory, we can use a double asterisk**to perform a recursive search. For example, suppose we have the following file structure:

data/
    
    subdir/
        
        

We want to match all of the values that begin with.txtsuffix, regardless of which subdirectory they are located in. We can use the**/*.txtas the pattern string:

txt_files_recursive = ("data/**/*.txt", recursive=True)
print(txt_files_recursive)

Output:

['data/', 'data/subdir/', 'data/subdir/']

utilizationrecursive=Trueparameter, we can match to all files in the subdirectory.

Match multiple suffixes

Sometimes we need to match files with multiple suffixes and can use the[]to specify a range of matches. For example, if we want to match.txtcap (a poem).csvfile, we can use the["*.txt", "*.csv"]as the pattern string:

txt_and_csv_files = ("data/*.[txt|csv]")
print(txt_and_csv_files)

Output:

['data/', 'data/', 'data/']

Get a list of catalogs

In addition to matching documents.globThe library can also get a list of directories. If we want to list all subdirectories, we can use the*/as the pattern string:

subdirs = ("data/*/") 
print(subdirs)

Output:

['data/subdir/']

Iterating with iglob()

For large directories, getting a list of all matching files at once can take up a lot of memory. In this case, you can use theiglob()function to iteratively fetch.iglob()Returns an iterator that returns the matching filenames one by one.

txt_files_iterator = ("data/*.txt")
for file in txt_files_iterator:
    print(file)

Output:

data/
data/

iglob()Ideal for saving memory overhead when working with large numbers of files.

Filtering and sorting matches

In the example above, we see that()Returns a list of all files and directories that match a pattern. However, sometimes we may only be interested in certain specific files, or we may want to sort the matches according to certain rules.globThe library provides a number of ways to fulfill these requirements.

Filter Matches

globlibrary allows us to use thefnmatchmodule's match method to filter the match results. This is useful for performing more complex pattern matching on match results. For example, let's say we only want to match patterns that begin withfileDocumentation that begins with:

import glob
import fnmatch
# Get all files that start with 'file'
file_starting_with_file = (("data/*"), "file*")
print(file_starting_with_file)

Output:

['data/', 'data/']

In this example, we use the()method to filter the results from matches that begin withfileDocumentation that begins with.

Sort Matching Results

globThe matches returned by the library are usually sorted according to the operating system's file system rules. However, sometimes we may need to sort the matches in a customized way. In this case, we can use Python's built-insorted()function to sort the matches.

For example, suppose we want to sort matching files by file size:

import glob
import os
# Get matching files and sort them by file size
matched_files = ("data/*.txt")
sorted_files_by_size = sorted(matched_files, key=)
print(sorted_files_by_size)

Output:

['data/', 'data/']

In this example, we use the()function assorted()functionalkeyparameter to sort the matches by file size.

Customized Matching Rules

globThe library allows us to use custom functions to filter and sort the matching results. For example, suppose we want to match all files ending with an odd number of digits and sort them by number size:

import glob
# Customized filter functions
def custom_filter(file_path):
    filename = file_path.split("/")[-1]
    last_char = filename[-5]  # Get the penultimate character, i.e., the last digit in the filename
    return last_char.isdigit() and int(last_char) % 2 == 1
# Get matching files and sort them by custom rules
matched_files = ("data/*")
filtered_and_sorted_files = sorted(filter(custom_filter, matched_files))
print(filtered_and_sorted_files)

Output:

['data/']

In this example, we define acustom_filter()function to filter files ending in an odd number of digits with thesorted()function sorts according to a custom rule.

Iterate over files in subdirectories

Previously, we introduced the use of**to perform a recursive search, but if you only wish to traverse the files in a subdirectory without going further into the subdirectory, you can use the()combining()to realize.

For example, suppose we have the following file structure:

data/
   
    subdir1/
       
       
    subdir2/
       

Now we just want to traversedatadirectory and all files in its direct subdirectories:

import glob
import os
def list_files_in_directory(directory):
    files = []
    for file_path in ((directory, "*")):
        if (file_path):
            (file_path)
    return files
directory_path = "data"
files_in_directory = list_files_in_directory(directory_path)
print(files_in_directory)

Output:

['data/', 'data/subdir1/', 'data/subdir1/', 'data/subdir2/']

In this example, we define alist_files_in_directory()function that iterates through the files in the specified directory and ignores subdirectories. Using the()function to construct file paths, ensuring that path separators under different operating systems are handled correctly.

File processing with the glob library

globLibraries are not just for matching and getting lists of files, but can also be handy for file manipulation. We can set thegloblibrary with other Python libraries (e.g.osshutiletc.) are used in combination to perform various file operations.

Reproduction of documents

Suppose we want to list all the files that begin with.txtto another directory, we can use theshutillibrary to implement it:

import glob
import shutil
source_directory = "data"
destination_directory = "backup"
txt_files = ((source_directory, "*.txt"))
for txt_file in txt_files:
    (txt_file, destination_directory)

In this example, we first use thegloblibrary to get all the libraries that start with.txtas a list of files with a suffix, and then use the()function copies these files to thebackupCatalog.

Delete file

If we wish to delete all the files that begin with.csvFiles suffixed with()function to implement it:

import glob
csv_files = ("data/*.csv")
for csv_file in csv_files:
    (csv_file)

In this example, we use thegloblibrary to get all the libraries that start with.csvas a list of files with a suffix, and then use the()function to delete these files.

Batch Rename Files

globlibrary with string processing and()function can be used in combination to batch rename files. Suppose we have a series of files with names in the formatfile_<num>.txt(e.g.file_1.txtfile_2.txtetc.), we now wish to rename them todata_<num>.txt

import glob
import os
files_to_rename = ("data/file_*.txt")
for old_file_path in files_to_rename:
    new_file_path = old_file_path.replace("file_", "data_")
    (old_file_path, new_file_path)

In this example, we first use thegloblibrary to get all the files that need to be renamed, and then use the stringreplace()method willfile_Replace withdata_The last use of the()function is renamed.

Considerations for using the glob library

even thoughglobThe library is a powerful tool, but there are some caveats when using it:

  • in usingglobWhen libraries are used, user-supplied inputs should be handled with care to avoid Path Traversal Attacks (PTAs).
  • utilizationgloblibrary with platform compatibility in mind, especially when dealing with path separators. It is recommended to use the()to construct file paths to ensure that it works correctly on different operating systems.
  • For large directories and large numbers of files, use theiglob()Or with methods such as generator (generator), you can avoid unnecessary memory overhead.

concluding remarks

globlibrary provides a simple and powerful way to manage files and directories, making it easy to work with files in Python scripts. By mastering thegloblibraries, we can write Python programs more efficiently and apply them in real projects.

Above is Python using the glob library to achieve easy to deal with the details of file and directory management, more information about Python glob library please pay attention to my other related articles!