introductory
In Python programming, we often need to work with files and directories. To handle these tasks more easily, Python provides theglob
library, which allows us to match files and directories based on specific patterns. This blog will go into detail about theglob
library and demonstrates its various functions with examples.
What is a glob library
glob
library is a module in the Python standard library that provides a simple and powerful way to match pathnames of files and directories. Typically, we use wildcards to search for files on the command line, for example*.txt
Indicates that it matches everything that starts with.txt
for files with a suffix.glob
library allows us to programmatically perform similar file matching operations in Python scripts.
glob
The main methods of the library areglob()
function that takes a pattern string as input and returns a list of all files and directories that match that pattern.
Installing the glob library
In most cases, Python comes pre-installed with theglob
library, so you don't need to install it additionally. If your Python environment doesn't have theglob
library, which can be installed using the following command:
pip install glob2
Now let's start exploringglob
Various functions of the library.
Basic usage
Importing the glob library
in usingglob
Before the library, you first need to import it. In Python, we use theimport
statement to import the module:
import glob
Matching files with wildcards
glob
The library uses wildcards to match files and directories. Here are some commonly used wildcards:
-
*
: Matches 0 or more characters. -
?
: Matches a single character. -
[]
: Matches characters within a specified range, such as[0-9]
Matches all numeric characters.
Let's look at an example. Suppose we have a folderdata
, which contains the following files:
data/
Now, we want to match all the values that start with.txt
for files with a suffix. We can use the*.txt
as the pattern string:
txt_files = ("data/*.txt") print(txt_files)
Output:
['data/', 'data/']
As we have seen.()
function returns a list of all the functions that begin with.txt
is the path to the file with the suffix.
Match a specific directory
If we want the matching file to be in a subdirectory, we can use a double asterisk**
to perform a recursive search. For example, suppose we have the following file structure:
data/ subdir/
We want to match all of the values that begin with.txt
suffix, regardless of which subdirectory they are located in. We can use the**/*.txt
as the pattern string:
txt_files_recursive = ("data/**/*.txt", recursive=True) print(txt_files_recursive)
Output:
['data/', 'data/subdir/', 'data/subdir/']
utilizationrecursive=True
parameter, we can match to all files in the subdirectory.
Match multiple suffixes
Sometimes we need to match files with multiple suffixes and can use the[]
to specify a range of matches. For example, if we want to match.txt
cap (a poem).csv
file, we can use the["*.txt", "*.csv"]
as the pattern string:
txt_and_csv_files = ("data/*.[txt|csv]") print(txt_and_csv_files)
Output:
['data/', 'data/', 'data/']
Get a list of catalogs
In addition to matching documents.glob
The library can also get a list of directories. If we want to list all subdirectories, we can use the*/
as the pattern string:
subdirs = ("data/*/") print(subdirs)
Output:
['data/subdir/']
Iterating with iglob()
For large directories, getting a list of all matching files at once can take up a lot of memory. In this case, you can use theiglob()
function to iteratively fetch.iglob()
Returns an iterator that returns the matching filenames one by one.
txt_files_iterator = ("data/*.txt") for file in txt_files_iterator: print(file)
Output:
data/
data/
iglob()
Ideal for saving memory overhead when working with large numbers of files.
Filtering and sorting matches
In the example above, we see that()
Returns a list of all files and directories that match a pattern. However, sometimes we may only be interested in certain specific files, or we may want to sort the matches according to certain rules.glob
The library provides a number of ways to fulfill these requirements.
Filter Matches
glob
library allows us to use thefnmatch
module's match method to filter the match results. This is useful for performing more complex pattern matching on match results. For example, let's say we only want to match patterns that begin withfile
Documentation that begins with:
import glob import fnmatch # Get all files that start with 'file' file_starting_with_file = (("data/*"), "file*") print(file_starting_with_file)
Output:
['data/', 'data/']
In this example, we use the()
method to filter the results from matches that begin withfile
Documentation that begins with.
Sort Matching Results
glob
The matches returned by the library are usually sorted according to the operating system's file system rules. However, sometimes we may need to sort the matches in a customized way. In this case, we can use Python's built-insorted()
function to sort the matches.
For example, suppose we want to sort matching files by file size:
import glob import os # Get matching files and sort them by file size matched_files = ("data/*.txt") sorted_files_by_size = sorted(matched_files, key=) print(sorted_files_by_size)
Output:
['data/', 'data/']
In this example, we use the()
function assorted()
functionalkey
parameter to sort the matches by file size.
Customized Matching Rules
glob
The library allows us to use custom functions to filter and sort the matching results. For example, suppose we want to match all files ending with an odd number of digits and sort them by number size:
import glob # Customized filter functions def custom_filter(file_path): filename = file_path.split("/")[-1] last_char = filename[-5] # Get the penultimate character, i.e., the last digit in the filename return last_char.isdigit() and int(last_char) % 2 == 1 # Get matching files and sort them by custom rules matched_files = ("data/*") filtered_and_sorted_files = sorted(filter(custom_filter, matched_files)) print(filtered_and_sorted_files)
Output:
['data/']
In this example, we define acustom_filter()
function to filter files ending in an odd number of digits with thesorted()
function sorts according to a custom rule.
Iterate over files in subdirectories
Previously, we introduced the use of**
to perform a recursive search, but if you only wish to traverse the files in a subdirectory without going further into the subdirectory, you can use the()
combining()
to realize.
For example, suppose we have the following file structure:
data/
subdir1/
subdir2/
Now we just want to traversedata
directory and all files in its direct subdirectories:
import glob import os def list_files_in_directory(directory): files = [] for file_path in ((directory, "*")): if (file_path): (file_path) return files directory_path = "data" files_in_directory = list_files_in_directory(directory_path) print(files_in_directory)
Output:
['data/', 'data/subdir1/', 'data/subdir1/', 'data/subdir2/']
In this example, we define alist_files_in_directory()
function that iterates through the files in the specified directory and ignores subdirectories. Using the()
function to construct file paths, ensuring that path separators under different operating systems are handled correctly.
File processing with the glob library
glob
Libraries are not just for matching and getting lists of files, but can also be handy for file manipulation. We can set theglob
library with other Python libraries (e.g.os
、shutil
etc.) are used in combination to perform various file operations.
Reproduction of documents
Suppose we want to list all the files that begin with.txt
to another directory, we can use theshutil
library to implement it:
import glob import shutil source_directory = "data" destination_directory = "backup" txt_files = ((source_directory, "*.txt")) for txt_file in txt_files: (txt_file, destination_directory)
In this example, we first use theglob
library to get all the libraries that start with.txt
as a list of files with a suffix, and then use the()
function copies these files to thebackup
Catalog.
Delete file
If we wish to delete all the files that begin with.csv
Files suffixed with()
function to implement it:
import glob csv_files = ("data/*.csv") for csv_file in csv_files: (csv_file)
In this example, we use theglob
library to get all the libraries that start with.csv
as a list of files with a suffix, and then use the()
function to delete these files.
Batch Rename Files
glob
library with string processing and()
function can be used in combination to batch rename files. Suppose we have a series of files with names in the formatfile_<num>.txt
(e.g.file_1.txt
、file_2.txt
etc.), we now wish to rename them todata_<num>.txt
:
import glob import os files_to_rename = ("data/file_*.txt") for old_file_path in files_to_rename: new_file_path = old_file_path.replace("file_", "data_") (old_file_path, new_file_path)
In this example, we first use theglob
library to get all the files that need to be renamed, and then use the stringreplace()
method willfile_
Replace withdata_
The last use of the()
function is renamed.
Considerations for using the glob library
even thoughglob
The library is a powerful tool, but there are some caveats when using it:
- in using
glob
When libraries are used, user-supplied inputs should be handled with care to avoid Path Traversal Attacks (PTAs). - utilization
glob
library with platform compatibility in mind, especially when dealing with path separators. It is recommended to use the()
to construct file paths to ensure that it works correctly on different operating systems. - For large directories and large numbers of files, use the
iglob()
Or with methods such as generator (generator), you can avoid unnecessary memory overhead.
concluding remarks
glob
library provides a simple and powerful way to manage files and directories, making it easy to work with files in Python scripts. By mastering theglob
libraries, we can write Python programs more efficiently and apply them in real projects.
Above is Python using the glob library to achieve easy to deal with the details of file and directory management, more information about Python glob library please pay attention to my other related articles!