introduction
PowerPoint (PPT) is a commonly used office tool in the workplace, but manual design and style adjustment are often time-consuming and labor-intensive. This article will introduce a set of Python-based automation solutions that implement the following functions through code:
- Extract PPT styles: Save PPT's text format, color, layout and other information as a JSON file.
- Apply styles to templates: Generate a new PPT according to the style defined by JSON.
- Addition and deletion of slides and copying: Flexible adjustment of PPT structure to meet dynamic content needs.
Code function overview
Core functional modules
-
Style Extraction and Save
- function:
extract_ppt_with_style(ppt_path, output_json)
- Function: Iterate through each page of PPT, extract the text box's font, color, paragraph alignment and other style information, and save it as a structured JSON file.
- function:
-
Style application and generation
- function:
apply_styles_to_ppt(template_path, json_path, output_pptx, data_json_llm)
- Function: Apply text content and format to the specified template according to the style defined by JSON to generate a new PPT that meets the requirements.
- function:
-
Slide Structure Management
- function:
copy_slide_and_insert_after
、delete_slide
、copy_ppt
- Function: Copy, delete slides, and dynamically adjust the number of intermediate pages (such as expanding or compressing content pages) according to requirements.
- function:
Detailed code explanation
1. Extract PPT style (extract_ppt_with_style)
def extract_ppt_with_style(ppt_path, output_json): prs = Presentation(ppt_path) data = [] for slide_idx, slide in enumerate(): slide_data = { "slide_number": slide_idx + 1, "shapes": [] } for shape in : if not shape.has_text_frame: continue # Skip non-text shapes text_frame = shape.text_frame text_info = { "shape_name": , "paragraphs": [] } for paragraph in text_frame.paragraphs: para_info = { "alignment": str(), "runs": [] } for run in : run_info = { "text": , "font": { "name": , "size": str(), "bold": , "italic": , "color": { "type": "theme" if == MSO_THEME_COLOR else "rgb", "theme_color": .theme_color, "rgb": ([0], [1], [2]) if else None } } } para_info["runs"].append(run_info) text_info["paragraphs"].append(para_info) slide_data["shapes"].append(text_info) (slide_data) # Save and compress JSON with open(output_json, 'w', encoding='utf-8') as f: (data, f, indent=2, ensure_ascii=False) data = json_compress(data) with open("compress_" + output_json, 'w', encoding='utf-8') as f: (data, f, indent=2, ensure_ascii=False) return data
-
Key points:
- Style analysis: Record font name, size, bold, italic, color (theme color or RGB value).
-
Structured storage: Every page in JSON (
slide
) contains multiple shapes (shapes
), each shape contains paragraphs (paragraphs
) and text fragments (runs
)。 -
Compression optimization:
json_compress
Functions simplify redundant data (such as common shape names) and improve storage efficiency.
2. Apply styles to generate PPT (apply_styles_to_ppt)
def apply_styles_to_ppt(template_path, json_path, output_pptx, data_json_llm): with open(json_path, 'r', encoding='utf-8') as f: data = (f) prs = Presentation(template_path) for slide_idx, slide in enumerate(): for shape_idx, shape in enumerate(): if not shape.has_text_frame: continue text_frame = shape.text_frame for paragraph_idx, paragraph in enumerate(text_frame.paragraphs): for run_idx, run in enumerate(): run_info = data[slide_idx]["shapes"][shape_idx]["paragraphs"][paragraph_idx]["runs"][run_idx] # Apply text content = data_json_llm[slide_idx]["shapes"].pop()["paragraphs"] # Apply styles = run_info["font"]["name"] = run_info["font"]["bold"] = run_info["font"]["italic"] # Processing colors color_data = run_info["font"]["color"] if color_data["type"] == "rgb": r, g, b = color_data["rgb"] = RGBColor(r, g, b) elif color_data["type"] == "theme": theme_color = getattr(MSO_THEME_COLOR, color_data["theme_color"], MSO_THEME_COLOR.ACCENT_1) .theme_color = theme_color (output_pptx)
-
Key points:
- Style reuse: Read font, color and other information from JSON and apply it directly to the corresponding position of the template PPT.
-
Dynamic content replacement:pass
data_json_llm
Parameters, the PPT can be dynamically populated with the text content generated by LLM (such as GPT).
3. Slide Structure Management
3.1 Copying and Inserting Slides
def copy_slide_and_insert_after(prs, source_index, target_index): source_slide = [source_index] new_slide = .add_slide(source_slide.slide_layout) # Copy shapes and relationships for shape in source_slide.shapes: new_el = deepcopy() new_slide.shapes._spTree.insert_element_before(new_el, 'p:extLst') # Adjust position slides = list(._sldIdLst) new_slide_id = () (target_index + 1, new_slide_id) ._sldIdLst[:] = slides
- Function: After copying the specified slide and inserting it to the target position, maintain layout and element consistency.
3.2 Delete the slides
def delete_slide(prs, slide_index): if slide_index < 0 or slide_index >= len(): print("Invalid slide index") return xml_slides = list(._sldIdLst) slides_id_to_delete = xml_slides[slide_index] ._sldIdLst.remove(slides_id_to_delete)
- Function: Delete by removing the slide ID to avoid format errors that may be caused by direct operations.
3.3 Dynamic expansion/compression of PPT page count
def copy_ppt(pages, template_path="", modified_path=""): prs = Presentation(template_path) copy_pages = pages - 2 # Exclude the front and end fixed pages center_pages = len() - 2 if copy_pages < center_pages: # Delete redundant pages for _ in range(center_pages - copy_pages): delete_slide(prs, len()-1) else: # Copy the middle page n = (copy_pages // center_pages) * center_pages for _ in range(n): for i in range(1, center_pages+1): copy_slide_and_insert_after(prs, i, i) (modified_path)
- Application scenarios: Dynamically adjust the number of intermediate pages according to requirements (such as expanding to 5 pages or compressing to 3 pages), and keep the front and last page fixed.
Example of usage
Scenario 1: Generate a PPT that matches the style
# 1. Extract the style of the original PPTextract_ppt_with_style("", "output_styles.json") # 2. Generate new content (for example through LLM)llm_json = [...] # LLM generated text content # 3. Apply style to generate the final PPTapply_styles_to_ppt("", "output_styles.json", "new_ppt.pptx", llm_json)
Scene 2: Dynamically adjust the number of PPT pages
# Assume that the original template has 5 pages (fixed at the beginning and end, 3 pages in the middle)copy_ppt(pages=7, template_path="") # Final 7 pages generated: 1 (first) + 5 (middle copy) + 1 (tail)
Application scenarios
- Enterprise Reporting Automation: Dynamically generate quarterly reports based on data and maintain a unified format.
- Training material generation: Create multiple sets of PPTs in batches, and only the intermediate content page needs to be adjusted.
- Marketing material management: Quickly copy product introduction templates, replace text and styles.
Summarize
The code base provided in this article realizes the full process automation from PPT style extraction, dynamic content generation to structural management. Developers can further optimize by:
- Integrated LLM: Combining the text generation part with GPT and other models to achieve full automation from content to style.
- Graphics processing: Extend the analysis and application of pictures and chart styles.
- user interface: Encapsulated as a GUI tool to lower the threshold for use.
In this way, companies can significantly reduce PPT production time and focus on content innovation rather than format adjustment.
from pptx import Presentation from import MSO_THEME_COLOR from import RGBColor from copy import deepcopy import json def extract_ppt_with_style(ppt_path, output_json): prs = Presentation(ppt_path) data = [] for slide_idx, slide in enumerate(): slide_data = { "slide_number": slide_idx + 1, "shapes": [] } for shape in : if not shape.has_text_frame: continue # Skip non-text shapes text_frame = shape.text_frame text_info = { "shape_name": , "paragraphs": [] } for paragraph in text_frame.paragraphs: para_info = { "alignment": str(), "runs": [] } for run in : run_info = { "text": , "font": { "name": , "size": str() if else None, "bold": , "italic": , "color": { "type": "theme" if == MSO_THEME_COLOR else "rgb", "theme_color": .theme_color, "rgb": ([0], [1], [2]) if else None } }, # "highlight_color": str(run.highlight_color) # Modify: Get from run instead of } para_info["runs"].append(run_info) text_info["paragraphs"].append(para_info) slide_data["shapes"].append(text_info) (slide_data) with open(output_json, 'w', encoding='utf-8') as f: (data, f, indent=2, ensure_ascii=False) data = json_compress(data) with open("compress" + "_" + output_json, 'w', encoding='utf-8') as f: (data, f, indent=2, ensure_ascii=False) return data def apply_styles_to_ppt(template_path, json_path, output_pptx, data_json_llm): with open(json_path, 'r', encoding='utf-8') as f: data = (f) prs = Presentation(template_path) for slide_idx, slide in enumerate(): for shape_idx, shape in enumerate(): if not shape.has_text_frame: continue # Skip non-text shapes text_frame = shape.text_frame for paragraph_idx, paragraph in enumerate(text_frame.paragraphs): for run_idx, run in enumerate(): run_info = data[slide_idx]["shapes"][shape_idx]["paragraphs"][paragraph_idx]["runs"][run_idx] text = data_json_llm[slide_idx]["shapes"].pop() # = run_info["text"] = text["paragraphs"] = run_info["font"]["name"] # = run_info["font"]["size"] = run_info["font"]["bold"] # = run_info["font"]["size"] = run_info["font"]["italic"] # Assume run_data is a dictionary read from JSON color_data = run_info["font"]["color"] if color_data["type"] == "rgb": # parse RGB values r_str, g_str, b_str = color_data["rgb"] r = r_str g = g_str b = b_str = RGBColor(r, g, b) elif color_data["type"] == "hex": # parse hexadecimal color hex_color = color_data["hex"].lstrip("#") r = int(hex_color[0:2], 16) g = int(hex_color[2:4], 16) b = int(hex_color[4:6], 16) = RGBColor(r, g, b) elif color_data["type"] == "theme": # Use theme colors (such as MSO_THEME_COLOR.ACCENT_1) theme_color_name = color_data["theme_color"] theme_color = getattr(MSO_THEME_COLOR, theme_color_name, MSO_THEME_COLOR.ACCENT_1) .theme_color = theme_color else: # Default color (black) = RGBColor(0, 0, 0) (output_pptx) def json_compress(json_data): for slide in json_data: for shape in slide["shapes"]: if "Shape" in shape["shape_name"]: shape["paragraphs"] = {} else: for paragraph in shape["paragraphs"]: for run in paragraph["runs"]: shape["paragraphs"] = run["text"] json_data_new = [] for slide in json_data: shapes = {"shapes": [], 'slide_number': slide['slide_number']} for shape in slide["shapes"]: if "Shape" in shape["shape_name"]: shape["paragraphs"] = {} else: shapes["shapes"].append(shape) json_data_new.append(shapes) return json_data_new def copy_slide_and_insert_after(prs, source_index, target_index): """ Copy the source slide and insert it behind the target slide. :param ps: Presentation object :param source_index: Index of source slide (starting from 0) :param target_index: Index of the target slide (the new slide will be inserted behind it) """ # Get the source slide source_slide = [source_index] # Create a new slide (using the same layout) new_slide_layout = source_slide.slide_layout new_slide = .add_slide(new_slide_layout) # Copy all shapes (including text boxes, pictures, charts, etc.) for shape in source_slide.shapes: el = new_el = deepcopy(el) new_slide.shapes._spTree.insert_element_before(new_el, 'p:extLst') # Copy relationships (such as hyperlinks, comments, etc.) for rel in source_slide.(): if "notesSlide" not in : # Exclude comment pages # Use the relate_to method instead of the add method new_slide.part.relate_to( rel._target, ) # Adjust slide order: Move new slides to the back of the target position slides = list(._sldIdLst) new_position = target_index + 1 # Insert the back of the target slide # Remove the new slideshow you just added (default is at the end) new_slide_id = () # Insert to the correct position (new_position, new_slide_id) ._sldIdLst[:] = slides def delete_slide(prs, slide_index): # prs = Presentation(template_path) """ Delete the slide at the given index. :param ps: Presentation object :param slide_index: Index of the slide to be deleted (starting from 0) """ # Make sure the index is within range if slide_index < 0 or slide_index >= len(): print("Invalid slide index") return # Get the slide ID list xml_slides = list(._sldIdLst) # Find the corresponding slide ID according to the index and remove it slides_id_to_delete = xml_slides[slide_index] ._sldIdLst.remove(slides_id_to_delete) # Save the modified PPT # (modified_path) def copy_ppt(pages, template_path="", source_index=1, target_index=1, modified_path="modified_example.pptx"): prs = Presentation(template_path) copy_pages, center_pages = pages - 2, len() - 2 if copy_pages != center_pages: if copy_pages < center_pages: start_page_index = center_pages for _ in range(center_pages - copy_pages): delete_slide(prs, start_page_index) start_page_index -= 1 else: n = (copy_pages // center_pages) * center_pages m = (copy_pages // center_pages + 1) * center_pages - copy_pages start_page_index = center_pages for _ in range(n): for i in range(1, center_pages + 1): copy_slide_and_insert_after(prs, i, start_page_index) start_page_index += 1 if m: for _ in range(m): delete_slide(prs, start_page_index) start_page_index -= 1 (modified_path) if __name__ == '__main__': #User Example # data=extract_ppt_with_style("", "output_styles.json") # # prompt_text=f""" # # ppt json template # {data} # # Template usage instructions # - The structure (number of elements) of each slide cannot be changed # - The key immutable value of the dictionary in shapes in each slide can be changed # - The first slide is not replicated and must be in the first location, but the content is mutable. The slide_number is also mutable. # - The last slide is also non-copyable and must be in the last position, but the content is mutable. The slide_number is also mutable. # - The slide in Zhongjian can be copied but the order cannot be changed # - For example, there are two slides in the middle 2,3. If your ppt needs 5 slides in the middle, then the copy order is 2,3,2,3,2. After copying, you can change the other slide_number names. # # After understanding the above template usage requirements, please complete the theme: Artificial Intelligence Changes the World PPT Outline and use the above template to generate the corresponding json # """ llm_json =[] # copy_ppt(len(llm_json)) data = extract_ppt_with_style("modified_example.pptx", "output_styles.json") apply_styles_to_ppt("modified_example.pptx", "output_styles.json", "new_ppt.pptx", llm_json)
This is the end of this article about using Python to automate the processing of PPT styles and structures. For more related content on Python automation processing, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!