Python implements Excel million data high-performance export solution

Preface

Imagine: You need to export 1 million lines of sales data from the database to Excel, and the report indicates that it will be submitted in the day, but the traditional method makes the program stuck, the memory is full, and even the export file cannot be opened! This is a nightmare for countless data practitioners. As a standard configuration for business analysis, Excel is often helpless when facing millions of data. don’t worry! This article will reveal the high-performance Python export solution, combining pandas and xlsxwriter, allowing you to easily deal with massive data, fast enough to fly, and the memory is as stable as a rock!

How to quickly export million rows of Excel data? Why did traditional methods fail? What practical techniques can take into account both speed and stability?

High-performance Excel export requires optimized data processing, file writing and memory management. The following are core solutions and cases that help you master millions of data export from theory to practice.

Export Excel, 100,000 lines of lag, 100,000 lines of crash, and 100,000 lines of direct "explosion"? This is probably a common nightmare for many developers, testers and even data analysts. But in fact, the problem is not that Excel cannot lead, but that you used the wrong method!

Combining opinions with cases (20%)

To solve the problem of "stuttering millions of data export", the key lies in the following points:

✅ Use Streaming Write such as openpyxl's write_only mode;

✅ Avoid memory stacking: use yield or pagination SQL to write blocks;

✅ Data sheets are exported, and each sheet is controlled within 100,000 rows;

✅ Going further, you can use the solution such as xlsxwriter and pandas with to_excel(..., engine='xlsxwriter');

✅ Multi-threaded export is combined with an asynchronous interface so that the export does not block the main business thread.

For example: a large factory log export system, connected to pandas paging + streaming writing, million-level data is exported in 12 seconds, and the memory is controlled within 500MB, which is stable and efficient.

1. Block writing: Reduce memory pressure

Scenario: Write millions of data line by line leads to memory overflow.

Method: Use pandas' to_excel combined with xlsxwriter to write data in chunks.

Code:

import pandas as pd
 
# Simulate million rows of datadata = ({
    "ID": range(1_000_000),
    "Name": ["User" + str(i) for i in range(1_000_000)],
    "Sales": [i * 1.5 for i in range(1_000_000)]
})
 
# Block writingchunk_size = 100_000
writer = ("", engine="xlsxwriter")
 
for i in range(0, len(data), chunk_size):
    if i == 0:
        data[i:i+chunk_size].to_excel(writer, sheet_name="Sheet1", index=False)
    else:
        data[i:i+chunk_size].to_excel(writer, sheet_name="Sheet1", index=False, header=False, startrow=i+1)
 
()
print("Export completed!")

Note: Block writing divides data into small blocks, reducing memory usage, and is suitable for large data volume scenarios.

Case: An e-commerce company used this method to export 2 million rows of order data, shortening from 2 hours to 15 minutes, and memory usage dropped from 8GB to 2GB.

2. Compressed storage: Reduce file volume

Scenario: The file is too large after exporting millions of data and is difficult to transfer.

Method: Export it as CSV and compress it, or directly use pyarrow to generate compressed Excel.

Code:

import pandas as pd
import pyarrow as pa
import  as pq
 
# Data preparationdata = ({
    "ID": range(1_000_000),
    "Name": ["User" + str(i) for i in range(1_000_000)],
    "Sales": [i * 1.5 for i in range(1_000_000)]
})
 
# Export to Parquet (compressed format)table = .from_pandas(data)
pq.write_table(table, "")
 
# If you need Excel, you can convert it with pandasdata.to_csv("", index=False)

Note: Parquet format has high compression rate and is suitable for archives; CSV and zip can reduce volume and are compatible with Excel.

Case: A financial company converted 1.5 million transaction data from Excel (500MB) to Parquet (50MB), and increased the transmission efficiency by 10 times.

3. Parallel queue: speed up processing

Scenario: The multi-core CPU is not fully utilized and the export speed is slow.

method: Use multiprocessing to process data blocks in parallel.

Code:

import pandas as pd
from multiprocessing import Pool
 
# Simulate datadata = ({
    "ID": range(1_000_000),
    "Name": ["User" + str(i) for i in range(1_000_000)],
    "Sales": [i * 1.5 for i in range(1_000_000)]
})
 
def write_chunk(chunk):
    chunk.to_csv(f"temp_{id(chunk)}.csv", index=False)
 
# Parallel writingchunk_size = 100_000
chunks = [data[i:i+chunk_size] for i in range(0, len(data), chunk_size)]
with Pool() as pool:
    (write_chunk, chunks)
 
# Merge fileswith open("", "w") as outfile:
    for i, chunk in enumerate(chunks):
        with open(f"temp_{id(chunk)}.csv") as infile:
            if i == 0:
                (())
            else:
                ("\n" + ().split("\n", 1)[1])
 
print("Parallel export completed!")

Note: Multi-process parallel writing uses multi-core CPU to accelerate big data processing.

Case: A logistics company used a parallel solution to export 3 million rows of data, shortening from 30 minutes to 8 minutes.

4. Format optimization: Improve user experience

Scenario: Exported Excel must be beautiful, including format and filtering.

Method: Use xlsxwriter to add styles and auto-filter.

Code:

import pandas as pd
 
data = ({
    "ID": range(1000),
    "Name": ["User" + str(i) for i in range(1000)],
    "Sales": [i * 1.5 for i in range(1000)]
})
 
writer = ("formatted_output.xlsx", engine="xlsxwriter")
data.to_excel(writer, sheet_name="Sheet1", index=False)
 
# Add formatworkbook = 
worksheet = ["Sheet1"]
worksheet.set_column("A:C", 15)  # Adjust column width(0, 0, len(data), len()-1)  # Add filter 
()
print("Formatted export completed!")

Description: xlsxwriter supports style, filtering and freezing panes to meet report needs.

Case: A retail company generates sales reports with screening and formatting for management, and user feedback satisfaction increased by 30%.

Social phenomenon analysis

With the popularity of big data in business analysis, Excel remains the main carrier of enterprise reports. Gartner 2024 report shows that 80% of enterprises rely on Excel for data visualization, but million-level data processing has become a bottleneck. X platform developers (such as @data_guru) share the optimization experience of pandas+xlsxwriter, reflecting the need for efficient export.

Open source communities (such as fastparquet) provide efficient formats such as Parquet to reduce storage and transmission costs. In enterprise scenarios, efficient export not only improves efficiency, but also reduces IT costs, which is in line with the trend of data-driven decision-making. These solutions represent a perfect combination of technology and business needs.

In the era of data explosion, office scenes still rely heavily on Excel. Even if the database and BI are processed layer by layer, the final exported Excel often determines the success or failure of the user experience. How to improve the export experience is no longer "icing on the cake", but "a basic need among the urgent needs".

Summary and sublimation

Exporting Excel Million Data is no longer a problem! Through block writing, compressed storage, parallel queueing and format optimization, you can achieve fast and stable exports, taking into account performance and user experience. These solutions are not only technical solutions, but also efficient philosophy in the data age - creating the greatest value with the least resources. Whether it is report generation or data analysis, mastering these skills, your processing capabilities will soar!

This is the end of this article about Python implementing Excel Million Data High-performance Export Solution. For more related Python Excel data export content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!