SoFunction
Updated on 2025-03-04

Python implements batch extraction of Excel data

Excel is a widely used data storage format during data processing and analysis. Use Python to efficiently extract data from multiple Excel files for summary and analysis. This article will introduce in detail how to use three libraries: pandas, openpyxl and xlrd to batch extract Excel data, and provide corresponding sample code.

Use pandas to extract Excel data in batches

pandas is a powerful data analysis library that provides the ability to directly read and process Excel files.

1. Install pandas

First, make sure pandas and openpyxl are installed:

pip install pandas openpyxl

2. Read a single Excel file

import pandas as pd

# Read Excel filesdf = pd.read_excel('')

# Show the first few lines of dataprint(())

3. Batch reading of multiple Excel files

Suppose there are multiple Excel files stored in one folder, the file name format is data_1.xlsx, data_2.xlsx, and so on.

import os

# folder path to store Excel filesfolder_path = 'path_to_folder'

# Get all Excel file pathsfile_list = [(folder_path, f) for f in (folder_path) if ('.xlsx')]

# Initialize an empty DataFrameall_data = ()

# Read and merge one by onefor file in file_list:
    df = pd.read_excel(file)
    all_data = all_data.append(df, ignore_index=True)

# Show merged dataprint(all_data.head())

Batch Extract Excel Data with openpyxl

openpyxl is a library that specializes in processing Excel files and is suitable for processing files in the .xlsx format.

1. Install openpyxl

pip install openpyxl

2. Read a single Excel file

from openpyxl import load_workbook

# Load Excel fileswb = load_workbook('')

# Select an activity sheetws = 

# Read all datadata = []
for row in ws.iter_rows(values_only=True):
    (row)

# Print datafor row in data:
    print(row)

3. Batch reading of multiple Excel files

import os
from openpyxl import load_workbook

# folder path to store Excel filesfolder_path = 'path_to_folder'

# Get all Excel file pathsfile_list = [(folder_path, f) for f in (folder_path) if ('.xlsx')]

# Initialize an empty listall_data = []

# Read and merge one by onefor file in file_list:
    wb = load_workbook(file)
    ws = 
    for row in ws.iter_rows(values_only=True):
        all_data.append(row)

# Print the merged datafor row in all_data:
    print(row)

Batch Extraction of Excel Data with xlrd

xlrd is a library for reading Excel files, suitable for files in .xls and .xlsx formats.

1. Install xlrd

pip install xlrd

2. Read a single Excel file

import xlrd

# Open Excel fileworkbook = xlrd.open_workbook('')

# Select a worksheetsheet = workbook.sheet_by_index(0)

# Read all datadata = []
for row_idx in range():
    row = sheet.row_values(row_idx)
    (row)

# Print datafor row in data:
    print(row)

3. Batch reading of multiple Excel files

import os
import xlrd

# folder path to store Excel filesfolder_path = 'path_to_folder'

# Get all Excel file pathsfile_list = [(folder_path, f) for f in (folder_path) if ('.xls') or ('.xlsx')]

# Initialize an empty listall_data = []

# Read and merge one by onefor file in file_list:
    workbook = xlrd.open_workbook(file)
    sheet = workbook.sheet_by_index(0)
    for row_idx in range():
        row = sheet.row_values(row_idx)
        all_data.append(row)

# Print the merged datafor row in all_data:
    print(row)

Summarize

This article details how to use three libraries: pandas, openpyxl and xlrd to batch extract Excel data, and provides corresponding sample code. Through these methods, multiple Excel files can be processed efficiently, improving the efficiency of data processing. I hope these contents can help everyone better process Excel data in actual development.

This is the end of this article about Python batch extraction of Excel data. For more related content related to Python batch extraction of Excel data, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!