SoFunction
Updated on 2025-04-28

Solutions to automate PPT styles and structures using Python

introduction

PowerPoint (PPT) is a commonly used office tool in the workplace, but manual design and style adjustment are often time-consuming and labor-intensive. This article will introduce a set of Python-based automation solutions that implement the following functions through code:

  • Extract PPT styles: Save PPT's text format, color, layout and other information as a JSON file.
  • Apply styles to templates: Generate a new PPT according to the style defined by JSON.
  • Addition and deletion of slides and copying: Flexible adjustment of PPT structure to meet dynamic content needs.

Code function overview

Core functional modules

  1. Style Extraction and Save

    • function:extract_ppt_with_style(ppt_path, output_json)
    • Function: Iterate through each page of PPT, extract the text box's font, color, paragraph alignment and other style information, and save it as a structured JSON file.
  2. Style application and generation

    • function:apply_styles_to_ppt(template_path, json_path, output_pptx, data_json_llm)
    • Function: Apply text content and format to the specified template according to the style defined by JSON to generate a new PPT that meets the requirements.
  3. Slide Structure Management

    • function:copy_slide_and_insert_afterdelete_slidecopy_ppt
    • Function: Copy, delete slides, and dynamically adjust the number of intermediate pages (such as expanding or compressing content pages) according to requirements.

Detailed code explanation

1. Extract PPT style (extract_ppt_with_style)

def extract_ppt_with_style(ppt_path, output_json):
    prs = Presentation(ppt_path)
    data = []
    for slide_idx, slide in enumerate():
        slide_data = {
            "slide_number": slide_idx + 1,
            "shapes": []
        }
        for shape in :
            if not shape.has_text_frame:
                continue  # Skip non-text shapes            text_frame = shape.text_frame
            text_info = {
                "shape_name": ,
                "paragraphs": []
            }
            for paragraph in text_frame.paragraphs:
                para_info = {
                    "alignment": str(),
                    "runs": []
                }
                for run in :
                    run_info = {
                        "text": ,
                        "font": {
                            "name": ,
                            "size": str(),
                            "bold": ,
                            "italic": ,
                            "color": {
                                "type": "theme" if  == MSO_THEME_COLOR else "rgb",
                                "theme_color": .theme_color,
                                "rgb": ([0], [1], [2]) if  else None
                            }
                        }
                    }
                    para_info["runs"].append(run_info)
                text_info["paragraphs"].append(para_info)
            slide_data["shapes"].append(text_info)
        (slide_data)
    # Save and compress JSON    with open(output_json, 'w', encoding='utf-8') as f:
        (data, f, indent=2, ensure_ascii=False)
    data = json_compress(data)
    with open("compress_" + output_json, 'w', encoding='utf-8') as f:
        (data, f, indent=2, ensure_ascii=False)
    return data
  • Key points
    • Style analysis: Record font name, size, bold, italic, color (theme color or RGB value).
    • Structured storage: Every page in JSON (slide) contains multiple shapes (shapes), each shape contains paragraphs (paragraphs) and text fragments (runs)。
    • Compression optimizationjson_compressFunctions simplify redundant data (such as common shape names) and improve storage efficiency.

2. Apply styles to generate PPT (apply_styles_to_ppt)

def apply_styles_to_ppt(template_path, json_path, output_pptx, data_json_llm):
    with open(json_path, 'r', encoding='utf-8') as f:
        data = (f)
    prs = Presentation(template_path)
    for slide_idx, slide in enumerate():
        for shape_idx, shape in enumerate():
            if not shape.has_text_frame:
                continue
            text_frame = shape.text_frame
            for paragraph_idx, paragraph in enumerate(text_frame.paragraphs):
                for run_idx, run in enumerate():
                    run_info = data[slide_idx]["shapes"][shape_idx]["paragraphs"][paragraph_idx]["runs"][run_idx]
                    # Apply text content                     = data_json_llm[slide_idx]["shapes"].pop()["paragraphs"]
                    # Apply styles                     = run_info["font"]["name"]
                     = run_info["font"]["bold"]
                     = run_info["font"]["italic"]
                    # Processing colors                    color_data = run_info["font"]["color"]
                    if color_data["type"] == "rgb":
                        r, g, b = color_data["rgb"]
                         = RGBColor(r, g, b)
                    elif color_data["type"] == "theme":
                        theme_color = getattr(MSO_THEME_COLOR, color_data["theme_color"], MSO_THEME_COLOR.ACCENT_1)
                        .theme_color = theme_color
    (output_pptx)
  • Key points
    • Style reuse: Read font, color and other information from JSON and apply it directly to the corresponding position of the template PPT.
    • Dynamic content replacement:passdata_json_llmParameters, the PPT can be dynamically populated with the text content generated by LLM (such as GPT).

3. Slide Structure Management

3.1 Copying and Inserting Slides

def copy_slide_and_insert_after(prs, source_index, target_index):
    source_slide = [source_index]
    new_slide = .add_slide(source_slide.slide_layout)
    # Copy shapes and relationships    for shape in source_slide.shapes:
        new_el = deepcopy()
        new_slide.shapes._spTree.insert_element_before(new_el, 'p:extLst')
    # Adjust position    slides = list(._sldIdLst)
    new_slide_id = ()
    (target_index + 1, new_slide_id)
    ._sldIdLst[:] = slides
  • Function: After copying the specified slide and inserting it to the target position, maintain layout and element consistency.

3.2 Delete the slides

def delete_slide(prs, slide_index):
    if slide_index < 0 or slide_index >= len():
        print("Invalid slide index")
        return
    xml_slides = list(._sldIdLst)
    slides_id_to_delete = xml_slides[slide_index]
    ._sldIdLst.remove(slides_id_to_delete)
  • Function: Delete by removing the slide ID to avoid format errors that may be caused by direct operations.

3.3 Dynamic expansion/compression of PPT page count

def copy_ppt(pages, template_path="", modified_path=""):
    prs = Presentation(template_path)
    copy_pages = pages - 2  # Exclude the front and end fixed pages    center_pages = len() - 2
    if copy_pages < center_pages:
        # Delete redundant pages        for _ in range(center_pages - copy_pages):
            delete_slide(prs, len()-1)
    else:
        # Copy the middle page        n = (copy_pages // center_pages) * center_pages
        for _ in range(n):
            for i in range(1, center_pages+1):
                copy_slide_and_insert_after(prs, i, i)
    (modified_path)
  • Application scenarios: Dynamically adjust the number of intermediate pages according to requirements (such as expanding to 5 pages or compressing to 3 pages), and keep the front and last page fixed.

Example of usage

Scenario 1: Generate a PPT that matches the style

# 1. Extract the style of the original PPTextract_ppt_with_style("", "output_styles.json")

# 2. Generate new content (for example through LLM)llm_json = [...]  # LLM generated text content
# 3. Apply style to generate the final PPTapply_styles_to_ppt("", "output_styles.json", "new_ppt.pptx", llm_json)

Scene 2: Dynamically adjust the number of PPT pages

# Assume that the original template has 5 pages (fixed at the beginning and end, 3 pages in the middle)copy_ppt(pages=7, template_path="")  # Final 7 pages generated: 1 (first) + 5 (middle copy) + 1 (tail)

Application scenarios

  • Enterprise Reporting Automation: Dynamically generate quarterly reports based on data and maintain a unified format.
  • Training material generation: Create multiple sets of PPTs in batches, and only the intermediate content page needs to be adjusted.
  • Marketing material management: Quickly copy product introduction templates, replace text and styles.

Summarize

The code base provided in this article realizes the full process automation from PPT style extraction, dynamic content generation to structural management. Developers can further optimize by:

  • Integrated LLM: Combining the text generation part with GPT and other models to achieve full automation from content to style.
  • Graphics processing: Extend the analysis and application of pictures and chart styles.
  • user interface: Encapsulated as a GUI tool to lower the threshold for use.

In this way, companies can significantly reduce PPT production time and focus on content innovation rather than format adjustment.

from pptx import Presentation
from  import MSO_THEME_COLOR
from  import RGBColor
from copy import deepcopy
import json


def extract_ppt_with_style(ppt_path, output_json):
    prs = Presentation(ppt_path)
    data = []

    for slide_idx, slide in enumerate():
        slide_data = {
            "slide_number": slide_idx + 1,
            "shapes": []
        }
        for shape in :
            if not shape.has_text_frame:
                continue  # Skip non-text shapes
            text_frame = shape.text_frame
            text_info = {
                "shape_name": ,
                "paragraphs": []
            }

            for paragraph in text_frame.paragraphs:
                para_info = {
                    "alignment": str(),
                    "runs": []
                }
                for run in :
                    run_info = {
                        "text": ,
                        "font": {
                            "name": ,
                            "size": str() if  else None,
                            "bold": ,
                            "italic": ,
                            "color": {
                                "type": "theme" if  == MSO_THEME_COLOR else "rgb",
                                "theme_color": .theme_color,
                                "rgb": ([0], [1],
                                        [2]) if  else None
                            }
                        },
                        # "highlight_color": str(run.highlight_color) # Modify: Get from run instead of                    }
                    para_info["runs"].append(run_info)
                text_info["paragraphs"].append(para_info)
            slide_data["shapes"].append(text_info)
        (slide_data)

    with open(output_json, 'w', encoding='utf-8') as f:
        (data, f, indent=2, ensure_ascii=False)
    data = json_compress(data)

    with open("compress" + "_" + output_json, 'w', encoding='utf-8') as f:
        (data, f, indent=2, ensure_ascii=False)
    return data


def apply_styles_to_ppt(template_path, json_path, output_pptx, data_json_llm):
    with open(json_path, 'r', encoding='utf-8') as f:
        data = (f)

    prs = Presentation(template_path)

    for slide_idx, slide in enumerate():

        for shape_idx, shape in enumerate():
            if not shape.has_text_frame:
                continue  # Skip non-text shapes
            text_frame = shape.text_frame

            for paragraph_idx, paragraph in enumerate(text_frame.paragraphs):

                for run_idx, run in enumerate():
                    run_info = data[slide_idx]["shapes"][shape_idx]["paragraphs"][paragraph_idx]["runs"][run_idx]
                    text = data_json_llm[slide_idx]["shapes"].pop()
                    #  = run_info["text"]
                     = text["paragraphs"]
                     = run_info["font"]["name"]
                    #  = run_info["font"]["size"]
                     = run_info["font"]["bold"]
                    #  = run_info["font"]["size"]
                     = run_info["font"]["italic"]

                    # Assume run_data is a dictionary read from JSON                    color_data = run_info["font"]["color"]

                    if color_data["type"] == "rgb":
                        # parse RGB values                        r_str, g_str, b_str = color_data["rgb"]
                        r = r_str
                        g = g_str
                        b = b_str
                         = RGBColor(r, g, b)
                    elif color_data["type"] == "hex":
                        # parse hexadecimal color                        hex_color = color_data["hex"].lstrip("#")
                        r = int(hex_color[0:2], 16)
                        g = int(hex_color[2:4], 16)
                        b = int(hex_color[4:6], 16)
                         = RGBColor(r, g, b)
                    elif color_data["type"] == "theme":
                        # Use theme colors (such as MSO_THEME_COLOR.ACCENT_1)                        theme_color_name = color_data["theme_color"]
                        theme_color = getattr(MSO_THEME_COLOR, theme_color_name, MSO_THEME_COLOR.ACCENT_1)
                        .theme_color = theme_color
                    else:
                        # Default color (black)                         = RGBColor(0, 0, 0)

    (output_pptx)


def json_compress(json_data):
    for slide in json_data:
        for shape in slide["shapes"]:
            if "Shape" in shape["shape_name"]:
                shape["paragraphs"] = {}
            else:
                for paragraph in shape["paragraphs"]:
                    for run in paragraph["runs"]:
                        shape["paragraphs"] = run["text"]
    json_data_new = []
    for slide in json_data:
        shapes = {"shapes": [], 'slide_number': slide['slide_number']}
        for shape in slide["shapes"]:
            if "Shape" in shape["shape_name"]:
                shape["paragraphs"] = {}
            else:
                shapes["shapes"].append(shape)
        json_data_new.append(shapes)

    return json_data_new


def copy_slide_and_insert_after(prs, source_index, target_index):
    """
     Copy the source slide and insert it behind the target slide.

     :param ps: Presentation object
     :param source_index: Index of source slide (starting from 0)
     :param target_index: Index of the target slide (the new slide will be inserted behind it)
     """
    # Get the source slide    source_slide = [source_index]

    # Create a new slide (using the same layout)    new_slide_layout = source_slide.slide_layout
    new_slide = .add_slide(new_slide_layout)

    # Copy all shapes (including text boxes, pictures, charts, etc.)    for shape in source_slide.shapes:
        el = 
        new_el = deepcopy(el)
        new_slide.shapes._spTree.insert_element_before(new_el, 'p:extLst')

    # Copy relationships (such as hyperlinks, comments, etc.)    for rel in source_slide.():
        if "notesSlide" not in :  # Exclude comment pages            # Use the relate_to method instead of the add method            new_slide.part.relate_to(
                rel._target,
                
            )

    # Adjust slide order: Move new slides to the back of the target position    slides = list(._sldIdLst)
    new_position = target_index + 1  # Insert the back of the target slide    # Remove the new slideshow you just added (default is at the end)    new_slide_id = ()
    # Insert to the correct position    (new_position, new_slide_id)
    ._sldIdLst[:] = slides


def delete_slide(prs, slide_index):
    # prs = Presentation(template_path)
    """
     Delete the slide at the given index.

     :param ps: Presentation object
     :param slide_index: Index of the slide to be deleted (starting from 0)
     """
    # Make sure the index is within range    if slide_index < 0 or slide_index >= len():
        print("Invalid slide index")
        return
    # Get the slide ID list    xml_slides = list(._sldIdLst)

    # Find the corresponding slide ID according to the index and remove it    slides_id_to_delete = xml_slides[slide_index]
    ._sldIdLst.remove(slides_id_to_delete)

    # Save the modified PPT    # (modified_path)


def copy_ppt(pages, template_path="", source_index=1, target_index=1,
             modified_path="modified_example.pptx"):
    prs = Presentation(template_path)
    copy_pages, center_pages = pages - 2, len() - 2
    if copy_pages != center_pages:

        if copy_pages < center_pages:
            start_page_index = center_pages
            for _ in range(center_pages - copy_pages):
                delete_slide(prs, start_page_index)
                start_page_index -= 1
        else:
            n = (copy_pages // center_pages) * center_pages
            m = (copy_pages // center_pages + 1) * center_pages - copy_pages
            start_page_index = center_pages
            for _ in range(n):
                for i in range(1, center_pages + 1):
                    copy_slide_and_insert_after(prs, i, start_page_index)
                    start_page_index += 1
            if m:

                for _ in range(m):
                    delete_slide(prs, start_page_index)
                    start_page_index -= 1

    (modified_path)


if __name__ == '__main__':
    #User Example    # data=extract_ppt_with_style("", "output_styles.json")
    #
    # prompt_text=f"""
    # # ppt json template    # {data}
    # # Template usage instructions    # - The structure (number of elements) of each slide cannot be changed    # - The key immutable value of the dictionary in shapes in each slide can be changed    # - The first slide is not replicated and must be in the first location, but the content is mutable. The slide_number is also mutable.    # - The last slide is also non-copyable and must be in the last position, but the content is mutable. The slide_number is also mutable.    # - The slide in Zhongjian can be copied but the order cannot be changed    # - For example, there are two slides in the middle 2,3. If your ppt needs 5 slides in the middle, then the copy order is 2,3,2,3,2. After copying, you can change the other slide_number names.    # # After understanding the above template usage requirements, please complete the theme: Artificial Intelligence Changes the World PPT Outline and use the above template to generate the corresponding json    # """
    llm_json =[]
    # copy_ppt(len(llm_json))
    data = extract_ppt_with_style("modified_example.pptx", "output_styles.json")

    apply_styles_to_ppt("modified_example.pptx", "output_styles.json", "new_ppt.pptx", llm_json)

This is the end of this article about using Python to automate the processing of PPT styles and structures. For more related content on Python automation processing, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!