Excel is a widely used data storage format during data processing and analysis. Use Python to efficiently extract data from multiple Excel files for summary and analysis. This article will introduce in detail how to use three libraries: pandas, openpyxl and xlrd to batch extract Excel data, and provide corresponding sample code.
Use pandas to extract Excel data in batches
pandas is a powerful data analysis library that provides the ability to directly read and process Excel files.
1. Install pandas
First, make sure pandas and openpyxl are installed:
pip install pandas openpyxl
2. Read a single Excel file
import pandas as pd # Read Excel filesdf = pd.read_excel('') # Show the first few lines of dataprint(())
3. Batch reading of multiple Excel files
Suppose there are multiple Excel files stored in one folder, the file name format is data_1.xlsx, data_2.xlsx, and so on.
import os # folder path to store Excel filesfolder_path = 'path_to_folder' # Get all Excel file pathsfile_list = [(folder_path, f) for f in (folder_path) if ('.xlsx')] # Initialize an empty DataFrameall_data = () # Read and merge one by onefor file in file_list: df = pd.read_excel(file) all_data = all_data.append(df, ignore_index=True) # Show merged dataprint(all_data.head())
Batch Extract Excel Data with openpyxl
openpyxl is a library that specializes in processing Excel files and is suitable for processing files in the .xlsx format.
1. Install openpyxl
pip install openpyxl
2. Read a single Excel file
from openpyxl import load_workbook # Load Excel fileswb = load_workbook('') # Select an activity sheetws = # Read all datadata = [] for row in ws.iter_rows(values_only=True): (row) # Print datafor row in data: print(row)
3. Batch reading of multiple Excel files
import os from openpyxl import load_workbook # folder path to store Excel filesfolder_path = 'path_to_folder' # Get all Excel file pathsfile_list = [(folder_path, f) for f in (folder_path) if ('.xlsx')] # Initialize an empty listall_data = [] # Read and merge one by onefor file in file_list: wb = load_workbook(file) ws = for row in ws.iter_rows(values_only=True): all_data.append(row) # Print the merged datafor row in all_data: print(row)
Batch Extraction of Excel Data with xlrd
xlrd is a library for reading Excel files, suitable for files in .xls and .xlsx formats.
1. Install xlrd
pip install xlrd
2. Read a single Excel file
import xlrd # Open Excel fileworkbook = xlrd.open_workbook('') # Select a worksheetsheet = workbook.sheet_by_index(0) # Read all datadata = [] for row_idx in range(): row = sheet.row_values(row_idx) (row) # Print datafor row in data: print(row)
3. Batch reading of multiple Excel files
import os import xlrd # folder path to store Excel filesfolder_path = 'path_to_folder' # Get all Excel file pathsfile_list = [(folder_path, f) for f in (folder_path) if ('.xls') or ('.xlsx')] # Initialize an empty listall_data = [] # Read and merge one by onefor file in file_list: workbook = xlrd.open_workbook(file) sheet = workbook.sheet_by_index(0) for row_idx in range(): row = sheet.row_values(row_idx) all_data.append(row) # Print the merged datafor row in all_data: print(row)
Summarize
This article details how to use three libraries: pandas, openpyxl and xlrd to batch extract Excel data, and provides corresponding sample code. Through these methods, multiple Excel files can be processed efficiently, improving the efficiency of data processing. I hope these contents can help everyone better process Excel data in actual development.
This is the end of this article about Python batch extraction of Excel data. For more related content related to Python batch extraction of Excel data, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!