SoFunction
Updated on 2025-04-25

How to use Python to read Excel data

Detailed tutorial on reading Excel data using Python

Python provides a variety of ways to read Excel files, the most commonly used libraries arepandasandopenpyxl. Below I will explain in detail how to use these two libraries to read Excel files and include some practical examples to help you write a blog.

1. Install the necessary dependencies

First, make sure to install itpandasandopenpyxlLibrary, these two libraries can help us read Excel files easily.

You can install them using the following command:

pip install pandas openpyxl
  • pandas: For data processing and analysis, it has built-in function to read Excel files.
  • openpyxl: A library that reads and writes Excel files in Excel 2007 and above.

2. Read Excel files

Suppose you have an Excel file, including the following data:

Name Age City
John 25 New York
Alice 30 London
Bob 22 Tokyo

You can usepandasTo read the contents of the file.

import pandas as pd
# Read Excel filedf = pd.read_excel('')
# Show dataprint(df)

Output

    Name  Age      City
0   John   25  New York
1  Alice   30    London
2    Bob   22     Tokyo

3. Read a specific worksheet

Excel files usually contain multiple worksheets. Can be specifiedsheet_nameto read a specific worksheet. For example, assumingThere is a name in the fileSheet2Worksheet:

df = pd.read_excel('', sheet_name='Sheet2')
print(df)

You can also specify worksheets through indexes:

df = pd.read_excel('', sheet_name=0)  # Read the first worksheet

4. Read multiple worksheets

If you want to read all sheets in Excel files at once, you can usesheet_name=None, it returns a containing multipleDataFramedictionary.

dfs = pd.read_excel('', sheet_name=None)
# traverse all worksheetsfor sheet, data in ():
    print(f"Worksheet name: {sheet}")
    print(data)

5. Read only specific columns

Sometimes, you just need to read some columns in an Excel file. Can be passedusecolsThe parameter specifies the column to be read.

df = pd.read_excel('', usecols=['Name', 'City'])
print(df)

Output

    Name      City
0   John  New York
1  Alice    London
2    Bob     Tokyo

You can also use the column index to read specific columns, for example:

df = pd.read_excel('', usecols=[0, 2])  # Read the first and third columnsprint(df)

6. Process missing data

When reading Excel data, you may encounter blank cells. You can usepandasProcess these missing data. For example, it can be done byna_valuesThe parameter recognizes a specific value asNaN, or usefillna()Method fills in missing values.

# Identify a specific value as NaNdf = pd.read_excel('', na_values=['N/A', 'NA'])
# Fill in missing data(value={'Age': 0, 'City': 'Unknown'}, inplace=True)
print(df)

7. Convert Excel data to another format

Sometimes you may need to save the read Excel data in other formats, such as a CSV file or a JSON file.pandasAllows you to achieve this easily.

# Save as CSV filedf.to_csv('', index=False)
# Save as JSON filedf.to_json('', orient='records')

8. Read large files and optimize performance

If the Excel file is very large, it may cause insufficient memory or slow reading.pandasProvides some methods to optimize performance, such as usingchunksizeRead data in chunks.

# Read Excel files in chunks, reading 100 lines each timechunks = pd.read_excel('large_data.xlsx', chunksize=100)
for chunk in chunks:
    print(chunk)

9. Read Excel using openpyxl

openpyxlIt is more suitable for scenarios where the underlying operations of Excel files are required, such as reading and modifying cell styles, formulas, etc. Here is a simple reading example:

from openpyxl import load_workbook
# Load Excel workbookwb = load_workbook('')
# Select a worksheetsheet = wb['Sheet1']
# Read the value of the specified cellfor row in sheet.iter_rows(min_row=1, max_row=sheet.max_row, values_only=True):
    print(row)

10. More Excel Reading Functions

You can usepandasMore options available, such as:

  • skiprows: Skip a specific number of rows
  • nrows: Read the specified number of rows
  • header: Set custom title line
# Skip the first two lines and read 5 lines of datadf = pd.read_excel('', skiprows=2, nrows=5)
print(df)

Summarize

passpandasandopenpyxl, you can easily read Excel files and perform various data processing operations.pandasMore suitable for fast and simple data analysis,openpyxlThis is suitable for scenarios where more in-depth control of Excel files is required.

This is the end of this article about reading Excel data using Python. For more related Python reading Excel data content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!