Project Practice for Implementing Pivot Tables in Excel with Pandas

introduction

In data analysis, pivot tables are a very powerful tool that helps us quickly summarize, analyze and visualize large amounts of data. While Excel provides built-in pivot table functionality, the Pandas library in Python provides a flexible and powerful alternative for scenarios where more complex operations or automation is required. This article will use specific examples to show how to use Pandas to implement pivot table functions similar to those in Excel.

Preparation

First make sure that the Pandas library is installed in your environment. If it has not been installed, you can quickly install it through the pip command:

pip install pandas openpyxl

Next, we will create a simulated real sales data form filesales_data.xlsx, and use Pandas to read the file, and then generate a pivot table.

Create simulated sales data

Assume ourssales_data.xlsxThe file contains the following data:

date	Product Name	Sales	area	Salesperson
2021-01-01	Product A	8000	East China	Zhang San
2021-01-02	Product B	12000	North China	Li Si
2021-01-03	Product C	5000	South China	Wang Wu
2021-01-04	Product D	15000	East China	Zhang San
2021-01-05	Product E	7000	Central China	Li Si
2021-01-06	Product F	20000	North China	Wang Wu
2021-01-07	Product G	9000	East China	Zhang San
2021-01-08	Product H	3000	South China	Li Si
2021-01-09	Product I	6000	Central China	Wang Wu
2021-01-10	Product J	11000	East China	Zhang San

Code implementation and output

First, we need to create this simulated data and save it to an Excel file, and then use Pandas to read and generate a pivot table.

import pandas as pd

# Create simulated datadata = {
    'date': ['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05',
            '2021-01-06', '2021-01-07', '2021-01-08', '2021-01-09', '2021-01-10'],
    'Product Name': ['Product A', 'Product B', 'Product C', 'Product D', 'Product E', 'Product F', 'Product G', 'Product H', 'Product I', 'Product J'],
    'Sales': [8000, 12000, 5000, 15000, 7000, 20000, 9000, 3000, 6000, 11000],
    'area': ['East China', 'North China', 'South China', 'East China', 'Central China', 'North China', 'East China', 'South China', 'Central China', 'East China'],
    'Salesperson': ['Zhang San', 'Li Si', 'Wang Wu', 'Zhang San', 'Li Si', 'Wang Wu', 'Zhang San', 'Li Si', 'Wang Wu', 'Zhang San']
}

# Save data to Excel filedf = (data)
df.to_excel('sales_data.xlsx', index=False)

# Reload datadf = pd.read_excel('sales_data.xlsx')

# Show the first few lines to check the dataprint("Raw Data:")
print(())

After running the above code, you will see the following output:

Raw data

Date Product Name Sales Area Sales Person
0 2021-01-01 Product A 8000 East China Zhang San
1 2021-01-02 Product B 12000 North China Li Si
2 2021-01-03 Product C 5000 South China Wang Wu
3 2021-01-04 Product D 15000 East China Zhang San
4 2021-01-05 Product E 7000 Central China Li Si

Example 1: Aggregate sales by region and salesperson

Suppose we want to aggregate sales by region and salesperson and calculate the total sales for each combination.

# Generate a Pivot Tablepivot_table1 = pd.pivot_table(df, values='Sales', index=['area', 'Salesperson'], aggfunc='sum')

print("\nSummary sales by region and salesperson:")
print(pivot_table1)

Sales by region and salesperson:

Region Salesperson
East China Zhang San 33000
North China Li Si 12000
Wang Wu 20000
South China Wang Wu 8000
Central China Li Si 13000
Wang Wu 6000
Name: Sales, dtype: int64

Example 2: Aggregate sales by region and display total sales for each region

Suppose we want to aggregate sales by region and show the total sales for each region.

# Generate a Pivot Tablepivot_table2 = pd.pivot_table(df, values='Sales', index=['area'], aggfunc='sum')

print("\nSummary sales by region:")
print(pivot_table2)

Sales by region:

area
North China 32000
East China 33000
South China 8000
Central China 13000
Name: Sales, dtype: int64

Example 3: Aggregate sales by region and salesperson and display average sales for each combination

Suppose we want to aggregate sales by region and salesperson and show the average sales for each combination.

# Generate a Pivot Tablepivot_table3 = pd.pivot_table(df, values='Sales', index=['area', 'Salesperson'], aggfunc='mean')

print("\nSummary sales by region and salesperson (average):")
print(pivot_table3)

Sales by region and salesperson (average):

Region Salesperson
East China Zhang San 11000.0
North China Li Si 12000.0
Wang Wu 20000.0
South China Wang Wu 5500.0
Central China Li Si 9333.333333
Wang Wu 6000.0
Name: Sales, dtype: float64

Example 4: Aggregate sales by region and salesperson and display the maximum sales for each combination

Suppose we want to aggregate sales by region and salesperson and show the maximum sales for each combination.

# Generate a Pivot Tablepivot_table4 = pd.pivot_table(df, values='Sales', index=['area', 'Salesperson'], aggfunc='max')

print("\nSummary sales by region and salesperson:")
print(pivot_table4)Output result

Sales by region and salesperson (maximum):

Region Salesperson
East China Zhang San 15000
North China Li Si 12000
Wang Wu 20000
South China Wang Wu 5000
Central China Li Si 7000
Wang Wu 6000
Name: Sales, dtype: int64

in conclusion

Through the above examples, we can see that the Pandas library provides very powerful and flexible functionality in handling pivot tables. Whether it is simple summary or complex multi-condition aggregation, Pandas can handle it easily. With these basic skills mastered, you will be able to manage and analyze your dataset more efficiently, thus gaining valuable insights faster. Hope this article helps you make better use of Pandas in your daily work!

This is the article about the project practice of using Pandas to implement pivot tables in Excel. For more related Pandas Excel pivot table content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!