1. Overview of technical solutions
An automated reporting solution is based on the following technical components:
- Python as the core programming language
- The python-docx library is used to handle Word documents
- pandas library is used for data processing and analysis
- matplotlib or plotly library is used for data visualization
- Word templates as the basic format of reports
The advantage of this solution is that it retains the flexibility of the layout of Word documents and has nothing to do with it, and at the same time, it uses Python's powerful data processing capabilities to achieve automated generation of report content.
2. Environment preparation and dependency installation
You need to configure the Python environment and install the necessary libraries:
# Install the required libraries# It is recommended to install in a virtual environmentpip install python-docx pandas matplotlib plotly openpyxl
python-docx is a Python library for creating and updating Microsoft Word(.docx) files
3. Word template design principles
Designing a good Word template is the basis for automated reporting. The template should consider the following points:
- Clear structure: including clear structures such as title, abstract, text, icon position, etc.
- Reserved placeholder: Set specific placeholder marks where dynamic fill is required
- Consistent format: use uniform font, color, paragraph style
- Consider scalability: Some parts may need to be dynamically increased or decreased based on data
A typical weekly report template may contain the following sections:
- Report title and time range
- Summary of main indicators
- Detailed data of each business line
- Description of abnormal situations
- Data Trend Icons
- Work plan for next week
Manipulating Word Documents with python-docx
The python-docx library provides rich APIs to manipulate Word documents. Here are some basic operations:
from docx import Document from import Inches, Pt, RGBColor from import WD_ALIGN_PARAGRAPH # Create a new Word documentdoc = Document() # Add a titledoc.add_heading('Weekly Report: 2025-04-21', 0) # Add paragraphp = doc.add_paragraph("Overall business operation this week:") p.add_run('good').bold = True p.add_run(', All main tables have grown steadily. ') # Add a tabletable = doc.add_table(rows=3, cols=3) # Set the table headerheader_cells = [0].cells header_cells[0].text = 'Indicator Name' header_cells[1].text = 'Values of this week' header_cells[2].text = 'Momentum change' # Fill in datadata_cells = [1].cells data_cells[0].text = 'Sales' data_cells[1].text = '¥1234567' data_cells[2].text = '+12.3' # Add an imagedoc.add_picture("", width=Inches(6), height=Inches(2)) # Save the document("weekly_report.docx")
Building data processing and acquisition modules
In practical applications, report data may come from multiple sources, such as databases, APIs, Excel files, etc. A flexible data acquisition and processing module is needed
#! /usr/bin/env/python3 # -*- coding=utf-8 -*- # @Author: jack # @Date : 2025/04/21/17:16 from docx import Document from import Inches, Pt, RGBColor from import WD_ALIGN_PARAGRAPH import pandas as pd import as plt from datetime import datetime, timedelta def get_report_period(): """Determine the time frame for reporting""" today = () # Assume that weekly reports cover last Monday to Sunday last_month = today - timedelta(days=() + 7) last_sunday = last_month + timedelta(days=6) return last_month, last_sunday def fetch_sales_data(start_date, end_date): """Get sales data from data sources""" # In actual application, this is a database query or API call # Use simulation data as an example here dates = pd.date_range(start=start_date, end=end_date) sales = [round(100000 + i * 5000 + i * i * 100) for i in range(len(dates))] return ({ "date": dates, "sales": sales}) def calculate_kpi(df): """Calculate Key Performance Indicators""" total_sales = df["sales"].sum() avg_sales = df["sales"].mean() max_sales = df["sales"].max() max_sales_day = [df["sales"].idxmax(), "date"] # Calculate month-on-month change # Suppose we have data from last week last_week_sales = total_sales * 0.9 # Simulate data sales_change = (total_sales - last_week_sales) / last_week_sales return { "total_sales": total_sales, "avg_sales": avg_sales, "max_sales": max_sales, "max_sales_day": max_sales_day, "sales_change": sales_change } def generate_charts(df, output_path): """Generate data visualization chart""" (figsize=(10, 6)) (df['date'], df['sales'], marker='o') ('Daily Sales Trend') ('date') ('Sales') (True) plt.tight_layout() (output_path) () return output_path
Implement template filling logic
#! /usr/bin/env/python3 # -*- coding=utf-8 -*- # @Author: jack # @Date : 2025/04/21/17:16 import os from docx import Document from import Inches, Pt, RGBColor from import WD_ALIGN_PARAGRAPH import pandas as pd import as plt from datetime import datetime, timedelta def get_report_period(): """Determine the time frame for reporting""" today = () # Assume that weekly reports cover last Monday to Sunday last_month = today - timedelta(days=() + 7) last_sunday = last_month + timedelta(days=6) return last_month, last_sunday def fetch_sales_data(start_date, end_date): """Get sales data from data sources""" # In actual application, this is a database query or API call # Use simulation data as an example here dates = pd.date_range(start=start_date, end=end_date) sales = [round(100000 + i * 5000 + i * i * 100) for i in range(len(dates))] return ({ "date": dates, "sales": sales}) def calculate_kpis(df): """Calculate Key Performance Indicators""" total_sales = df["sales"].sum() avg_sales = df["sales"].mean() max_sales = df["sales"].max() max_sales_day = [df["sales"].idxmax(), "date"] # Calculate month-on-month change # Suppose we have data from last week last_week_sales = total_sales * 0.9 # Simulate data sales_change = (total_sales - last_week_sales) / last_week_sales return { "total_sales": total_sales, "avg_sales": avg_sales, "max_sales": max_sales, "max_sales_day": max_sales_day, "sales_change": sales_change } def generate_charts(df, output_path): """Generate data visualization chart""" (figsize=(10, 6)) (df['date'], df['sales'], marker='o') ('Daily Sales Trend') ('date') ('Sales') (True) plt.tight_layout() (output_path) () return output_path def generate_report(template_path, output_path): """Main function to generate weekly reports""" # Get report time range start_date, end_date = get_report_period() period_str = f"{start_date.strftime('%Y year %m month %d day')} to {end_date.strftime('%Y year %m month %d day')}" # Get and process data sales_data = fetch_sales_data(start_date, end_date) kpis = calculate_kpis(sales_data) # Generate a chart chart_path = generate_charts(sales_data, 'sales_trend.png') # Load Word templates doc = Document(template_path) # Replace date in title for paragraph in : if '{{report_period}}' in : = ('{{report_period}}', period_str) # Fill in KPI data for paragraph in : if '{{total_sales}}' in : = ('{{total_sales}}', f"¥{kpis['total_sales']:,.2f}") if '{{sales_change}}' in : change_text = f"+{kpis['sales_change']:.2%}" if kpis['sales_change'] >= 0 else f"{kpis['sales_change']:.2%}" = ('{{sales_change}}', change_text) # Fill in table data for table in : for row in : for cell in : for paragraph in : if '{{avg_sales}}' in : = ('{{avg_sales}}', f"¥{kpis['avg_sales']:,.2f}") if '{{max_sales}}' in : = ('{{max_sales}}', f"¥{kpis['max_sales']:,.2f}") if '{{max_sales_day}}' in : day_str = kpis['max_sales_day'].strftime('%Y year %m month %d day') = ('{{max_sales_day}}', day_str) # Add a chart for paragraph in : if '{{sales_chart}}' in : # Save the reference for the current paragraph p = paragraph # Clear the placeholder text = "" # Add an image in the same location run = p.add_run() run.add_picture(chart_path, width=Inches(6)) # Save the generated report (output_path) print(f"Weekly report has been generated:{output_path}") return output_path def main(): # Template and output file paths template_path = "weekly_report.docx" start_date, end_date = get_report_period() output_filename = f"Sales Weekly_{start_date.strftime('%Y%m%d')}_{end_date.strftime('%Y%m%d')}.docx" output_path = ("reports", output_filename) # Make sure the output directory exists ("reports", exist_ok=True) # Generate a report generate_report(template_path, output_path) if __name__ == "__main__": main()
Advanced: Dynamic report content generation
In practical applications, the content of the report may need to be dynamically adjusted according to changes in the data. For example, when abnormal data is detected, additional instructions or warnings need to be added to the report. Here is an example of an extension to deal with dynamic content:
def add_dynamic_sections(doc, sales_data, kpis): """Dynamic addition of report content based on data situation""" # For example: When the sales growth rate exceeds 20%, add special instructions if kpis['sales_change'] > 0.2: doc.add_heading('Statement of Significant Sales', level=2) p = doc.add_paragraph() p.add_run(f"Sales this week increased compared to last week{kpis['sales_change']:.2%},Significantly higher than expected。") p.add_run("The main growth points come from the following aspects:").bold = True # Add bullet list doc.add_paragraph("Sales growth brought about by new product line online", style='List Bullet') doc.add_paragraph("The marketing campaign is effective", style='List Bullet') doc.add_paragraph("Increased orders for key customers", style='List Bullet') # Detect sales abnormal days daily_avg = sales_data['sales'].mean() std_dev = sales_data['sales'].std() anomaly_days = sales_data[abs(sales_data['sales'] - daily_avg) > 2 * std_dev] ifnot anomaly_days.empty: doc.add_heading('Abnormal Sales Day Analysis', level=2) p = doc.add_paragraph("There was a significant abnormality in sales data on the following dates detected this week:") # Add exception day table table = doc.add_table(rows=1, cols=3) = 'Table Grid' # Set the table header header_cells = [0].cells header_cells[0].text = 'date' header_cells[1].text = 'Sales' header_cells[2].text = 'Difference from the mean' # Add data rows for _, row in anomaly_days.iterrows(): cells = table.add_row().cells cells[0].text = row['date'].strftime('%Y-%m-%d') cells[1].text = f"¥{row['sales']:,.2f}" deviation = (row['sales'] - daily_avg) / daily_avg cells[2].text = f"{deviation:.2%}" doc.add_paragraph("It is recommended to further investigate the causes of these abnormal situations so that appropriate business measures are taken.")
The above is the detailed content of the complete process of implementing weekly report automation based on Python + Word. For more information about Python Word weekly report automation, please pay attention to my other related articles!