SoFunction
Updated on 2025-05-14

Code parsing of PPTs and LLM generated content using Python to automate the code of generating PPTs and LLM

Core code analysis

1. Extract PPT styles to JSON

extract_ppt_with_styleFunctions are used to extract the styles of PPT (fonts, colors, paragraph formats, etc.) into JSON files for easier subsequent reuse.

Key steps:

  • Traverse every page of PPT: Extract the style of text boxes page by page.
  • Record style information: Includes font name, size, bold, italic, color (supports theme colors and RGB colors).
  • JSON structure example
{
  "slide_number": 1,
  "shapes": [
    {
      "shape_name": "Text",
      "paragraphs": [
        {
          "alignment": "LEFT",
          "runs": [
            {
              "text": "Self-introduction",
              "font": {
                "name": "Arial",
                "size": 24,
                "bold": true,
                "italic": false,
                "color": {
                  "type": "rgb",
                  "rgb": [255, 0, 0]
                }
              }
            }
          ]
        }
      ]
    }
  ]
}

Code snippet:

def extract_ppt_with_style(ppt_path, output_json):
    prs = Presentation(ppt_path)
    data = []
    for slide_idx, slide in enumerate():
        slide_data = {
            "slide_number": slide_idx + 1,
            "shapes": []
        }
        for shape in :
            if not shape.has_text_frame:
                continue
            text_frame = shape.text_frame
            text_info = {
                "shape_name": "Text",  # Force set to "Text" type                "paragraphs": []
            }
            for paragraph in text_frame.paragraphs:
                para_info = {
                    "alignment": str(),
                    "runs": []
                }
                for run in :
                    run_info = {
                        "text": ,
                        "font": {
                            "name": ,
                            "size": str() if  else None,
                            "bold": ,
                            "italic": ,
                            "color": {
                                "type": "theme" if  == MSO_THEME_COLOR else "rgb",
                                "theme_color": .theme_color,
                                "rgb": ([0], [1], [2]) if  else None
                            }
                        }
                    }
                    para_info["runs"].append(run_info)
                text_info["paragraphs"].append(para_info)
            slide_data["shapes"].append(text_info)
        (slide_data)
    with open(output_json, 'w', encoding='utf-8') as f:
        (data, f, indent=2, ensure_ascii=False)

2. Apply JSON style to new PPT

apply_styles_to_pptFunctions apply content and format to template PPT based on style information in JSON files.

Key steps:

  • Read JSON data: Analyze font, color and other style information.
  • Dynamic style setting: Supports RGB colors, theme colors, and is compatible with hexadecimal colors (such as#FF0000)。
  • Generate the final PPT: Save the modified style as a new file.

Code snippet:

def apply_styles_to_ppt(template_path, json_path, output_pptx):
    with open(json_path, 'r', encoding='utf-8') as f:
        data = (f)
    prs = Presentation(template_path)
    for slide_idx, slide in enumerate():
        for shape_idx, shape in enumerate():
            if not shape.has_text_frame:
                continue
            text_frame = shape.text_frame
            for paragraph_idx, paragraph in enumerate(text_frame.paragraphs):
                for run_idx, run in enumerate():
                    run_info = data[slide_idx]["shapes"][shape_idx]["paragraphs"][paragraph_idx]["runs"][run_idx]
                     = run_info["text"]  # Replace text content                     = run_info["font"]["name"]
                     = run_info["font"]["size"]
                     = run_info["font"]["bold"]
                     = run_info["font"]["italic"]
                    color_data = run_info["font"]["color"]
                    if color_data["type"] == "rgb":
                        r, g, b = color_data["rgb"]  # Directly parse RGB arrays                         = RGBColor(r, g, b)
                    elif color_data["type"] == "hex":
                        hex_color = color_data["hex"].lstrip("#")
                        r = int(hex_color[0:2], 16)
                        g = int(hex_color[2:4], 16)
                        b = int(hex_color[4:6], 16)
                         = RGBColor(r, g, b)
                    elif color_data["type"] == "theme":
                        theme_color = getattr(MSO_THEME_COLOR, color_data["theme_color"], MSO_THEME_COLOR.ACCENT_1)
                        .theme_color = theme_color
                    else:
                         = RGBColor(0, 0, 0)  # Default black    (output_pptx)

Generate content in combination with LLM

Scenario: Generate a self-introduction PPT

Assuming that you need to automatically generate a self-introduction PPT with style based on the name, position and other information entered by the user, you can follow the following steps:

1. Use LLM to generate text content

By calling LLM (such as GPT-3.5, Tongyi Qianwen, etc.), generate self-introduction text content:

import openai

def generate_self_introduction(name, role):
    prompt = f"Generate a About {name}({role})Self-introduction,The requirements are concise and clear,Suitable PPT exhibit。"
    response = (
        engine="text-davinci-003",
        prompt=prompt,
        max_tokens=150
    )
    return [0].()

2. Inject LLM content into JSON

Populate the generated text content into JSONtextIn the field:

# Assume that the extracted JSON structure is as follows:json_data = {
    "slide_number": 1,
    "shapes": [
        {
            "shape_name": "Text",
            "paragraphs": [
                {
                    "runs": [
                        {"text": "【placeholder to be replaced】", ...}
                    ]
                }
            ]
        }
    ]
}

# Replace text contentgenerated_text = generate_self_introduction("Zhang San", "Data Analyst")
json_data["shapes"][0]["paragraphs"][0]["runs"][0]["text"] = generated_text

3. Generate the final PPT

Callapply_styles_to_pptApply styles and content to templates:

apply_styles_to_ppt("", "", "")

Things to note

  1. JSON format requirements

    • The color value must be in array format (such asrgb: [255, 0, 0]) or hexadecimal string (such as"hex": "#FF0000")。
    • Theme color needs to be usedMSO_THEME_COLOREnumeration name (such as"ACCENT_1")。
  2. Standardization of shape names

    • When extracting a style, force theshape_nameSet to"Text", ensure consistency in subsequent processing.
  3. compatibility

    • Ensure that the shape structure of the template PPT matches the JSON data (such as position, hierarchy).

Complete example

if __name__ == '__main__':
    # 1. Extract template styles to JSON    extract_ppt_with_style("", "output_styles.json")
    
    # 2. Generate self-introduction text and modify JSON    with open("output_styles.json", "r") as f:
        data = (f)
    # Assume that the first paragraph of text is modified    data[0]["shapes"][0]["paragraphs"][0]["runs"][0]["text"] = "I am Zhang San, a data analyst..."
    
    # 3. Generate the final PPT    apply_styles_to_ppt("", "output_styles.json", "new_ppt.pptx")

Through the above methods, you can automatically generate personalized PPTs, combined with the content generation capabilities of LLM, and realize the full process automation from design to content!

from pptx import Presentation
from  import MSO_THEME_COLOR
from  import RGBColor
import json


def extract_ppt_with_style(ppt_path, output_json):
    prs = Presentation(ppt_path)
    data = []

    for slide_idx, slide in enumerate():
        slide_data = {
            "slide_number": slide_idx + 1,
            "shapes": []
        }
        for shape in :
            if not shape.has_text_frame:
                continue  # Skip non-text shapes
            text_frame = shape.text_frame
            text_info = {
                "shape_name": ,
                "paragraphs": []
            }

            for paragraph in text_frame.paragraphs:
                para_info = {
                    "alignment": str(),
                    "runs": []
                }
                for run in :
                    run_info = {
                        "text": ,
                        "font": {
                            "name": ,
                            "size": str() if  else None,
                            "bold": ,
                            "italic": ,
                            "color": {
                                "type": "theme" if  == MSO_THEME_COLOR else "rgb",
                                "theme_color": .theme_color,
                                "rgb": ([0], [1],
                                        [2]) if  else None
                            }
                        },
                        # "highlight_color": str(run.highlight_color) # Modify: Get from run instead of                    }
                    para_info["runs"].append(run_info)
                text_info["paragraphs"].append(para_info)
            slide_data["shapes"].append(text_info)
        (slide_data)

    with open(output_json, 'w', encoding='utf-8') as f:
        (data, f, indent=2, ensure_ascii=False)


def apply_styles_to_ppt(template_path, json_path, output_pptx):
    with open(json_path, 'r', encoding='utf-8') as f:
        data = (f)

    prs = Presentation(template_path)

    for slide_idx, slide in enumerate():

        for shape_idx, shape in enumerate():
            if not shape.has_text_frame:
                continue  # Skip non-text shapes
            text_frame = shape.text_frame

            for paragraph_idx, paragraph in enumerate(text_frame.paragraphs):

                for run_idx, run in enumerate():
                    run_info = data[slide_idx]["shapes"][shape_idx]["paragraphs"][paragraph_idx]["runs"][run_idx]

                     = run_info["text"]
                     = run_info["font"]["name"]
                     = run_info["font"]["size"]
                     = run_info["font"]["bold"]
                     = run_info["font"]["size"]
                     = run_info["font"]["italic"]

                    # Assume run_data is a dictionary read from JSON                    color_data = run_info["font"]["color"]

                    if color_data["type"] == "rgb":
                        # parse RGB values                        r_str, g_str, b_str = color_data["rgb"]
                        r = r_str
                        g = g_str
                        b = b_str
                         = RGBColor(r, g, b)
                    elif color_data["type"] == "hex":
                        # parse hexadecimal color                        hex_color = color_data["hex"].lstrip("#")
                        r = int(hex_color[0:2], 16)
                        g = int(hex_color[2:4], 16)
                        b = int(hex_color[4:6], 16)
                         = RGBColor(r, g, b)
                    elif color_data["type"] == "theme":
                        # Use theme colors (such as MSO_THEME_COLOR.ACCENT_1)                        theme_color_name = color_data["theme_color"]
                        theme_color = getattr(MSO_THEME_COLOR, theme_color_name, MSO_THEME_COLOR.ACCENT_1)
                        .theme_color = theme_color
                    else:
                        # Default color (black)                         = RGBColor(0, 0, 0)

    (output_pptx)


if __name__ == '__main__':
    #User Example    extract_ppt_with_style("", "output_styles.json")
    # This is a json structure parsed by a ppt template. The name is shape and the type remains unchanged. Please change the name is text of type Text, and the value of text will be introduced by yourself # Note: Only output json    #User Example    apply_styles_to_ppt("", "output_styles.json", "new_ppt.pptx")

The above is the detailed content of the code analysis of using Python to automatically generate PPT and combining LLM to generate content. For more information about Python to automatically generate PPT, please pay attention to my other related articles!