SoFunction
Updated on 2024-11-14

Python batch realization of Word/EXCEL/PPT to PDF

I. Introductory background

In daily office and document processing, sometimes we need to convert multiple Word documents, Excel tables or PPT presentations to PDF files. The benefit of converting documents to PDF format is that it preserves the layout and formatting of the document and allows for easy viewing and sharing across different platforms.

This blog post will introduce how to use Python programming language to batch realize the conversion of multiple Word, Excel and PPT files to PDF files. We will read, edit and save these documents by using Python third-party libraries and convert them to PDF format using a suitable conversion tool.

For the implementation, we will first install the required Python libraries and related software, mainly using three library functions: os, , gc.

1)os : os is a library built into Python for interacting with the operating system. It provides many functions for working with files and directories, such as creating, deleting, and renaming files or directories, getting file attributes, traversing directories, and more. By using theos library, we can perform a variety of OS-related tasks in Python programs.

2) : is a Python library for interacting with COM components on the Windows platform.COM (Component Object Model) is an object-oriented component technology that allows communication and interaction between different applications. The library provides a convenient way to call and manipulate COM components such as Microsoft Office applications (Word, Excel, PowerPoint, etc.). With this library, we can automate some Office tasks such as reading and writing documents, manipulating Excel tables, creating PPT presentations, etc.

3)gc : gc is a built-in garbage collection module for Python. Garbage collection refers to automatically detecting and reclaiming memory space that is no longer in use during program execution to improve memory utilization and program performance.gc module provides us with features such as manually triggering garbage collection, getting and setting thresholds for garbage collection, and so on. Although Python has an automatic garbage collection mechanism, in some cases we may need to manually control the garbage collection behavior.

We will then write Python code to iterate through all the documents in the specified folder and convert each document one by one.

Finally, we will save the converted PDF files to the specified directory.

By reading this blog post, you will learn how to batch implement the conversion of multiple Word, Excel and PPT files to PDF files using Python programming language. This will provide you with an automated way to handle document conversion tasks, saving time and effort and increasing productivity.

Whether you are an office worker, a student or a large number of documents need to deal with individual users, this tutorial will help you master how to use Python batch to achieve Word, Excel and PPT conversion to PDF files. Let's start this convenient and practical document processing journey together!

II. Code Practice

Everyone run this code, just need to change to your own path. For example, my address in the code: D:\Pycharmproject2023\code_test_project\shan_test\data, change it to your local address.

import os, , gc
# Word
def word2Pdf(filePath, words):
    # If there are no files then just quit after the prompt
    if (len(words) < 1):
        print("\n [no Word file]\n")
        return
    # Start the conversion
    print("\n [Start Word -> PDF Conversion]")
    try:
        print("Open Word Processes...")
        word = ("")
         = 0
         = False
        doc = None
        for i in range(len(words)):
            print(i)
            fileName = words[i]  # File name
            fromFile = (filePath, fileName)  # File address
            toFileName = changeSufix2Pdf(fileName)  # Name of file generated
            toFile = toFileJoin(filePath, toFileName)  # Addresses of documents generated
            print("Conversion:" + fileName + "In the file...")
            # Errors in one file do not affect the printing of other files
            try:
                doc = (fromFile)
                (toFile, 17)  # All PDFs generated will be in the PDF folder
                print("Convert to:" + toFileName + "Finish.")
            except Exception as e:
                print(e)
            # Close the Word process
        print("All Word documents have been printed.")
        print("End Word process... \n")
        ()
        doc = None
        ()
        word = None
    except Exception as e:
        print(e)
    finally:
        ()
# Excel
def excel2Pdf(filePath, excels):
    # If there are no files then just quit after the prompt
    if (len(excels) < 1):
        print("\n [No Excel file]\n")
        return
    # Start the conversion
    print("\n [Start Excel -> PDF Conversion]")
    try:
        print("Open Excel in process...")
        excel = ("")
         = 0
         = False
        wb = None
        ws = None
        for i in range(len(excels)):
            print(i)
            fileName = excels[i]  # File name
            fromFile = (filePath, fileName)  # File address
            print("Conversion:" + fileName + "In the file...")
            # Errors in one file do not affect the printing of other files
            try:
                wb = (fromFile)
                for j in range():  # of worksheets, a workbook may have multiple worksheets
                    toFileName = addWorksheetsOrder(fileName, j + 1)  # Name of file generated
                    toFile = toFileJoin(filePath, toFileName)  # Addresses of documents generated
                    ws = (j + 1)  # If [0] then the package will prompt for an out-of-bounds message
                    (0, toFile)  # Each one needs to be printed
                    print("Convert to:" + toFileName + "Documentation completed.")
            except Exception as e:
                print(e)
        # Close the Excel process
        print("All Excel files have been printed.")
        print("End Excel process in... \n")
        ws = None
        ()
        wb = None
        ()
        excel = None
    except Exception as e:
        print(e)
    finally:
        ()
# PPT
def ppt2Pdf(filePath, ppts):
    # If there are no files then just quit after the prompt
    if (len(ppts) < 1):
        print("\n [No PPT file]\n")
        return
    # Start the conversion
    print("\n [Start PPT -> PDF Conversion]")
    try:
        print("Open the PowerPoint process in...")
        powerpoint = ("")
        ppt = None
        # Errors in one file do not affect the printing of other files
        for i in range(len(ppts)):
            print(i)
            fileName = ppts[i]  # File name
            fromFile = (filePath, fileName)  # File address
            toFileName = changeSufix2Pdf(fileName)  # Name of file generated
            toFile = toFileJoin(filePath, toFileName)  # Addresses of documents generated
            print("Conversion:" + fileName + "In the file...")
            try:
                ppt = (fromFile, WithWindow=False)
                if  > 0:
                    (toFile, 32)  # If empty, a box will pop up (no solution found for now)
                    print("Convert to:" + toFileName + "Documentation completed.")
                else:
                    print("(Error, unexpected: this file is empty, skip this file)")
            except Exception as e:
                print(e)
        # Close the PPT process
        print("All PPT files have been printed.")
        print("End PowerPoint process in... \n")
        ()
        ppt = None
        ()
        powerpoint = None
    except Exception as e:
        print(e)
    finally:
        ()
# Modify the suffix
def changeSufix2Pdf(file):
    return file[:('.')] + ".pdf"
# Add workbook serial numbers
def addWorksheetsOrder(file, i):
    return file[:('.')] + "_worksheet" + str(i) + ".pdf"
# Converted address
def toFileJoin(filePath, file):
    return (filePath, 'pdf', file[:('.')] + ".pdf")
# Start the program
print("====================Program start====================")
print(
    "[Program Function] will be the target path within all the ppt, excel, word are generated a corresponding PDF file, exists in the newly generated pdf folder (need to have installed office, excluding sub-folders)")
print(
    "Note: If a PPT and Excel file is empty, there will be an error to skip this file. If the conversion of PPT time is too long, please check whether there is an error window waiting for confirmation, temporarily unable to completely solve the PPT window problem (empty error has been solved). In the process of closing the process, the time may be longer, about ten seconds, please be patient.")
# Path to the file to be converted
# filePath = input("Enter the target path: (if it is the current path: " + () + ", please enter directly)\n")
filePath = "D:\Pycharmproject2023\code_test_project\shan_test\data"
# Destination path, or current path if no path is entered
if (filePath == ""):
    filePath = ()
# Categorize all files in the destination folder and open only one process for conversion
words = []
ppts = []
excels = []
for fn in (filePath):
    if (('.doc', 'docx')):
        (fn)
    if (('.ppt', 'pptx')):
        (fn)
    if (('.xls', 'xlsx')):
        (fn)
# Calling methods
print("====================Start conversion====================")
# Save the path: a new pdf folder, all generated PDF files are placed in it!
folder = filePath + '\\pdf\\'
if not (folder):
    (folder)
word2Pdf(filePath, words)
excel2Pdf(filePath, excels)
ppt2Pdf(filePath, ppts)
print("====================End of conversion====================")
print("\n====================End of program====================")
# ("pause")

III. Practical effects

Above is Python batch realization of Word/EXCEL/PPT to PDF details, more about python word excel ppt to pdf information please pay attention to my other related articles!