introduction
In data analysis, pivot tables are a very powerful tool that helps us quickly summarize, analyze and visualize large amounts of data. While Excel provides built-in pivot table functionality, the Pandas library in Python provides a flexible and powerful alternative for scenarios where more complex operations or automation is required. This article will use specific examples to show how to use Pandas to implement pivot table functions similar to those in Excel.
Preparation
First make sure that the Pandas library is installed in your environment. If it has not been installed, you can quickly install it through the pip command:
pip install pandas openpyxl
Next, we will create a simulated real sales data form filesales_data.xlsx
, and use Pandas to read the file, and then generate a pivot table.
Create simulated sales data
Assume ourssales_data.xlsx
The file contains the following data:
date | Product Name | Sales | area | Salesperson |
---|---|---|---|---|
2021-01-01 | Product A | 8000 | East China | Zhang San |
2021-01-02 | Product B | 12000 | North China | Li Si |
2021-01-03 | Product C | 5000 | South China | Wang Wu |
2021-01-04 | Product D | 15000 | East China | Zhang San |
2021-01-05 | Product E | 7000 | Central China | Li Si |
2021-01-06 | Product F | 20000 | North China | Wang Wu |
2021-01-07 | Product G | 9000 | East China | Zhang San |
2021-01-08 | Product H | 3000 | South China | Li Si |
2021-01-09 | Product I | 6000 | Central China | Wang Wu |
2021-01-10 | Product J | 11000 | East China | Zhang San |
Code implementation and output
First, we need to create this simulated data and save it to an Excel file, and then use Pandas to read and generate a pivot table.
import pandas as pd # Create simulated datadata = { 'date': ['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05', '2021-01-06', '2021-01-07', '2021-01-08', '2021-01-09', '2021-01-10'], 'Product Name': ['Product A', 'Product B', 'Product C', 'Product D', 'Product E', 'Product F', 'Product G', 'Product H', 'Product I', 'Product J'], 'Sales': [8000, 12000, 5000, 15000, 7000, 20000, 9000, 3000, 6000, 11000], 'area': ['East China', 'North China', 'South China', 'East China', 'Central China', 'North China', 'East China', 'South China', 'Central China', 'East China'], 'Salesperson': ['Zhang San', 'Li Si', 'Wang Wu', 'Zhang San', 'Li Si', 'Wang Wu', 'Zhang San', 'Li Si', 'Wang Wu', 'Zhang San'] } # Save data to Excel filedf = (data) df.to_excel('sales_data.xlsx', index=False) # Reload datadf = pd.read_excel('sales_data.xlsx') # Show the first few lines to check the dataprint("Raw Data:") print(())
After running the above code, you will see the following output:
Raw data
Date Product Name Sales Area Sales Person
0 2021-01-01 Product A 8000 East China Zhang San
1 2021-01-02 Product B 12000 North China Li Si
2 2021-01-03 Product C 5000 South China Wang Wu
3 2021-01-04 Product D 15000 East China Zhang San
4 2021-01-05 Product E 7000 Central China Li Si
Example 1: Aggregate sales by region and salesperson
Suppose we want to aggregate sales by region and salesperson and calculate the total sales for each combination.
# Generate a Pivot Tablepivot_table1 = pd.pivot_table(df, values='Sales', index=['area', 'Salesperson'], aggfunc='sum') print("\nSummary sales by region and salesperson:") print(pivot_table1)
Sales by region and salesperson:
Region Salesperson
East China Zhang San 33000
North China Li Si 12000
Wang Wu 20000
South China Wang Wu 8000
Central China Li Si 13000
Wang Wu 6000
Name: Sales, dtype: int64
Example 2: Aggregate sales by region and display total sales for each region
Suppose we want to aggregate sales by region and show the total sales for each region.
# Generate a Pivot Tablepivot_table2 = pd.pivot_table(df, values='Sales', index=['area'], aggfunc='sum') print("\nSummary sales by region:") print(pivot_table2)
Sales by region:
area
North China 32000
East China 33000
South China 8000
Central China 13000
Name: Sales, dtype: int64
Example 3: Aggregate sales by region and salesperson and display average sales for each combination
Suppose we want to aggregate sales by region and salesperson and show the average sales for each combination.
# Generate a Pivot Tablepivot_table3 = pd.pivot_table(df, values='Sales', index=['area', 'Salesperson'], aggfunc='mean') print("\nSummary sales by region and salesperson (average):") print(pivot_table3)
Sales by region and salesperson (average):
Region Salesperson
East China Zhang San 11000.0
North China Li Si 12000.0
Wang Wu 20000.0
South China Wang Wu 5500.0
Central China Li Si 9333.333333
Wang Wu 6000.0
Name: Sales, dtype: float64
Example 4: Aggregate sales by region and salesperson and display the maximum sales for each combination
Suppose we want to aggregate sales by region and salesperson and show the maximum sales for each combination.
# Generate a Pivot Tablepivot_table4 = pd.pivot_table(df, values='Sales', index=['area', 'Salesperson'], aggfunc='max') print("\nSummary sales by region and salesperson:") print(pivot_table4)Output result
Sales by region and salesperson (maximum):
Region Salesperson
East China Zhang San 15000
North China Li Si 12000
Wang Wu 20000
South China Wang Wu 5000
Central China Li Si 7000
Wang Wu 6000
Name: Sales, dtype: int64
in conclusion
Through the above examples, we can see that the Pandas library provides very powerful and flexible functionality in handling pivot tables. Whether it is simple summary or complex multi-condition aggregation, Pandas can handle it easily. With these basic skills mastered, you will be able to manage and analyze your dataset more efficiently, thus gaining valuable insights faster. Hope this article helps you make better use of Pandas in your daily work!
This is the article about the project practice of using Pandas to implement pivot tables in Excel. For more related Pandas Excel pivot table content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!