Core code analysis
1. Extract PPT styles to JSON
extract_ppt_with_style
Functions are used to extract the styles of PPT (fonts, colors, paragraph formats, etc.) into JSON files for easier subsequent reuse.
Key steps:
- Traverse every page of PPT: Extract the style of text boxes page by page.
- Record style information: Includes font name, size, bold, italic, color (supports theme colors and RGB colors).
- JSON structure example:
{ "slide_number": 1, "shapes": [ { "shape_name": "Text", "paragraphs": [ { "alignment": "LEFT", "runs": [ { "text": "Self-introduction", "font": { "name": "Arial", "size": 24, "bold": true, "italic": false, "color": { "type": "rgb", "rgb": [255, 0, 0] } } } ] } ] } ] }
Code snippet:
def extract_ppt_with_style(ppt_path, output_json): prs = Presentation(ppt_path) data = [] for slide_idx, slide in enumerate(): slide_data = { "slide_number": slide_idx + 1, "shapes": [] } for shape in : if not shape.has_text_frame: continue text_frame = shape.text_frame text_info = { "shape_name": "Text", # Force set to "Text" type "paragraphs": [] } for paragraph in text_frame.paragraphs: para_info = { "alignment": str(), "runs": [] } for run in : run_info = { "text": , "font": { "name": , "size": str() if else None, "bold": , "italic": , "color": { "type": "theme" if == MSO_THEME_COLOR else "rgb", "theme_color": .theme_color, "rgb": ([0], [1], [2]) if else None } } } para_info["runs"].append(run_info) text_info["paragraphs"].append(para_info) slide_data["shapes"].append(text_info) (slide_data) with open(output_json, 'w', encoding='utf-8') as f: (data, f, indent=2, ensure_ascii=False)
2. Apply JSON style to new PPT
apply_styles_to_ppt
Functions apply content and format to template PPT based on style information in JSON files.
Key steps:
- Read JSON data: Analyze font, color and other style information.
-
Dynamic style setting: Supports RGB colors, theme colors, and is compatible with hexadecimal colors (such as
#FF0000
)。 - Generate the final PPT: Save the modified style as a new file.
Code snippet:
def apply_styles_to_ppt(template_path, json_path, output_pptx): with open(json_path, 'r', encoding='utf-8') as f: data = (f) prs = Presentation(template_path) for slide_idx, slide in enumerate(): for shape_idx, shape in enumerate(): if not shape.has_text_frame: continue text_frame = shape.text_frame for paragraph_idx, paragraph in enumerate(text_frame.paragraphs): for run_idx, run in enumerate(): run_info = data[slide_idx]["shapes"][shape_idx]["paragraphs"][paragraph_idx]["runs"][run_idx] = run_info["text"] # Replace text content = run_info["font"]["name"] = run_info["font"]["size"] = run_info["font"]["bold"] = run_info["font"]["italic"] color_data = run_info["font"]["color"] if color_data["type"] == "rgb": r, g, b = color_data["rgb"] # Directly parse RGB arrays = RGBColor(r, g, b) elif color_data["type"] == "hex": hex_color = color_data["hex"].lstrip("#") r = int(hex_color[0:2], 16) g = int(hex_color[2:4], 16) b = int(hex_color[4:6], 16) = RGBColor(r, g, b) elif color_data["type"] == "theme": theme_color = getattr(MSO_THEME_COLOR, color_data["theme_color"], MSO_THEME_COLOR.ACCENT_1) .theme_color = theme_color else: = RGBColor(0, 0, 0) # Default black (output_pptx)
Generate content in combination with LLM
Scenario: Generate a self-introduction PPT
Assuming that you need to automatically generate a self-introduction PPT with style based on the name, position and other information entered by the user, you can follow the following steps:
1. Use LLM to generate text content
By calling LLM (such as GPT-3.5, Tongyi Qianwen, etc.), generate self-introduction text content:
import openai def generate_self_introduction(name, role): prompt = f"Generate a About {name}({role})Self-introduction,The requirements are concise and clear,Suitable PPT exhibit。" response = ( engine="text-davinci-003", prompt=prompt, max_tokens=150 ) return [0].()
2. Inject LLM content into JSON
Populate the generated text content into JSONtext
In the field:
# Assume that the extracted JSON structure is as follows:json_data = { "slide_number": 1, "shapes": [ { "shape_name": "Text", "paragraphs": [ { "runs": [ {"text": "【placeholder to be replaced】", ...} ] } ] } ] } # Replace text contentgenerated_text = generate_self_introduction("Zhang San", "Data Analyst") json_data["shapes"][0]["paragraphs"][0]["runs"][0]["text"] = generated_text
3. Generate the final PPT
Callapply_styles_to_ppt
Apply styles and content to templates:
apply_styles_to_ppt("", "", "")
Things to note
-
JSON format requirements:
- The color value must be in array format (such as
rgb: [255, 0, 0]
) or hexadecimal string (such as"hex": "#FF0000"
)。 - Theme color needs to be used
MSO_THEME_COLOR
Enumeration name (such as"ACCENT_1"
)。
- The color value must be in array format (such as
-
Standardization of shape names:
- When extracting a style, force the
shape_name
Set to"Text"
, ensure consistency in subsequent processing.
- When extracting a style, force the
-
compatibility:
- Ensure that the shape structure of the template PPT matches the JSON data (such as position, hierarchy).
Complete example
if __name__ == '__main__': # 1. Extract template styles to JSON extract_ppt_with_style("", "output_styles.json") # 2. Generate self-introduction text and modify JSON with open("output_styles.json", "r") as f: data = (f) # Assume that the first paragraph of text is modified data[0]["shapes"][0]["paragraphs"][0]["runs"][0]["text"] = "I am Zhang San, a data analyst..." # 3. Generate the final PPT apply_styles_to_ppt("", "output_styles.json", "new_ppt.pptx")
Through the above methods, you can automatically generate personalized PPTs, combined with the content generation capabilities of LLM, and realize the full process automation from design to content!
from pptx import Presentation from import MSO_THEME_COLOR from import RGBColor import json def extract_ppt_with_style(ppt_path, output_json): prs = Presentation(ppt_path) data = [] for slide_idx, slide in enumerate(): slide_data = { "slide_number": slide_idx + 1, "shapes": [] } for shape in : if not shape.has_text_frame: continue # Skip non-text shapes text_frame = shape.text_frame text_info = { "shape_name": , "paragraphs": [] } for paragraph in text_frame.paragraphs: para_info = { "alignment": str(), "runs": [] } for run in : run_info = { "text": , "font": { "name": , "size": str() if else None, "bold": , "italic": , "color": { "type": "theme" if == MSO_THEME_COLOR else "rgb", "theme_color": .theme_color, "rgb": ([0], [1], [2]) if else None } }, # "highlight_color": str(run.highlight_color) # Modify: Get from run instead of } para_info["runs"].append(run_info) text_info["paragraphs"].append(para_info) slide_data["shapes"].append(text_info) (slide_data) with open(output_json, 'w', encoding='utf-8') as f: (data, f, indent=2, ensure_ascii=False) def apply_styles_to_ppt(template_path, json_path, output_pptx): with open(json_path, 'r', encoding='utf-8') as f: data = (f) prs = Presentation(template_path) for slide_idx, slide in enumerate(): for shape_idx, shape in enumerate(): if not shape.has_text_frame: continue # Skip non-text shapes text_frame = shape.text_frame for paragraph_idx, paragraph in enumerate(text_frame.paragraphs): for run_idx, run in enumerate(): run_info = data[slide_idx]["shapes"][shape_idx]["paragraphs"][paragraph_idx]["runs"][run_idx] = run_info["text"] = run_info["font"]["name"] = run_info["font"]["size"] = run_info["font"]["bold"] = run_info["font"]["size"] = run_info["font"]["italic"] # Assume run_data is a dictionary read from JSON color_data = run_info["font"]["color"] if color_data["type"] == "rgb": # parse RGB values r_str, g_str, b_str = color_data["rgb"] r = r_str g = g_str b = b_str = RGBColor(r, g, b) elif color_data["type"] == "hex": # parse hexadecimal color hex_color = color_data["hex"].lstrip("#") r = int(hex_color[0:2], 16) g = int(hex_color[2:4], 16) b = int(hex_color[4:6], 16) = RGBColor(r, g, b) elif color_data["type"] == "theme": # Use theme colors (such as MSO_THEME_COLOR.ACCENT_1) theme_color_name = color_data["theme_color"] theme_color = getattr(MSO_THEME_COLOR, theme_color_name, MSO_THEME_COLOR.ACCENT_1) .theme_color = theme_color else: # Default color (black) = RGBColor(0, 0, 0) (output_pptx) if __name__ == '__main__': #User Example extract_ppt_with_style("", "output_styles.json") # This is a json structure parsed by a ppt template. The name is shape and the type remains unchanged. Please change the name is text of type Text, and the value of text will be introduced by yourself # Note: Only output json #User Example apply_styles_to_ppt("", "output_styles.json", "new_ppt.pptx")
The above is the detailed content of the code analysis of using Python to automatically generate PPT and combining LLM to generate content. For more information about Python to automatically generate PPT, please pay attention to my other related articles!