I. Introductory background
In daily office and document processing, sometimes we need to convert multiple Word documents, Excel tables or PPT presentations to PDF files. The benefit of converting documents to PDF format is that it preserves the layout and formatting of the document and allows for easy viewing and sharing across different platforms.
This blog post will introduce how to use Python programming language to batch realize the conversion of multiple Word, Excel and PPT files to PDF files. We will read, edit and save these documents by using Python third-party libraries and convert them to PDF format using a suitable conversion tool.
For the implementation, we will first install the required Python libraries and related software, mainly using three library functions: os, , gc.
1)os : os
is a library built into Python for interacting with the operating system. It provides many functions for working with files and directories, such as creating, deleting, and renaming files or directories, getting file attributes, traversing directories, and more. By using theos
library, we can perform a variety of OS-related tasks in Python programs.
2) :
is a Python library for interacting with COM components on the Windows platform.COM (Component Object Model) is an object-oriented component technology that allows communication and interaction between different applications.
The library provides a convenient way to call and manipulate COM components such as Microsoft Office applications (Word, Excel, PowerPoint, etc.). With this library, we can automate some Office tasks such as reading and writing documents, manipulating Excel tables, creating PPT presentations, etc.
3)gc : gc
is a built-in garbage collection module for Python. Garbage collection refers to automatically detecting and reclaiming memory space that is no longer in use during program execution to improve memory utilization and program performance.gc
module provides us with features such as manually triggering garbage collection, getting and setting thresholds for garbage collection, and so on. Although Python has an automatic garbage collection mechanism, in some cases we may need to manually control the garbage collection behavior.
We will then write Python code to iterate through all the documents in the specified folder and convert each document one by one.
Finally, we will save the converted PDF files to the specified directory.
By reading this blog post, you will learn how to batch implement the conversion of multiple Word, Excel and PPT files to PDF files using Python programming language. This will provide you with an automated way to handle document conversion tasks, saving time and effort and increasing productivity.
Whether you are an office worker, a student or a large number of documents need to deal with individual users, this tutorial will help you master how to use Python batch to achieve Word, Excel and PPT conversion to PDF files. Let's start this convenient and practical document processing journey together!
II. Code Practice
Everyone run this code, just need to change to your own path. For example, my address in the code: D:\Pycharmproject2023\code_test_project\shan_test\data, change it to your local address.
import os, , gc # Word def word2Pdf(filePath, words): # If there are no files then just quit after the prompt if (len(words) < 1): print("\n [no Word file]\n") return # Start the conversion print("\n [Start Word -> PDF Conversion]") try: print("Open Word Processes...") word = ("") = 0 = False doc = None for i in range(len(words)): print(i) fileName = words[i] # File name fromFile = (filePath, fileName) # File address toFileName = changeSufix2Pdf(fileName) # Name of file generated toFile = toFileJoin(filePath, toFileName) # Addresses of documents generated print("Conversion:" + fileName + "In the file...") # Errors in one file do not affect the printing of other files try: doc = (fromFile) (toFile, 17) # All PDFs generated will be in the PDF folder print("Convert to:" + toFileName + "Finish.") except Exception as e: print(e) # Close the Word process print("All Word documents have been printed.") print("End Word process... \n") () doc = None () word = None except Exception as e: print(e) finally: () # Excel def excel2Pdf(filePath, excels): # If there are no files then just quit after the prompt if (len(excels) < 1): print("\n [No Excel file]\n") return # Start the conversion print("\n [Start Excel -> PDF Conversion]") try: print("Open Excel in process...") excel = ("") = 0 = False wb = None ws = None for i in range(len(excels)): print(i) fileName = excels[i] # File name fromFile = (filePath, fileName) # File address print("Conversion:" + fileName + "In the file...") # Errors in one file do not affect the printing of other files try: wb = (fromFile) for j in range(): # of worksheets, a workbook may have multiple worksheets toFileName = addWorksheetsOrder(fileName, j + 1) # Name of file generated toFile = toFileJoin(filePath, toFileName) # Addresses of documents generated ws = (j + 1) # If [0] then the package will prompt for an out-of-bounds message (0, toFile) # Each one needs to be printed print("Convert to:" + toFileName + "Documentation completed.") except Exception as e: print(e) # Close the Excel process print("All Excel files have been printed.") print("End Excel process in... \n") ws = None () wb = None () excel = None except Exception as e: print(e) finally: () # PPT def ppt2Pdf(filePath, ppts): # If there are no files then just quit after the prompt if (len(ppts) < 1): print("\n [No PPT file]\n") return # Start the conversion print("\n [Start PPT -> PDF Conversion]") try: print("Open the PowerPoint process in...") powerpoint = ("") ppt = None # Errors in one file do not affect the printing of other files for i in range(len(ppts)): print(i) fileName = ppts[i] # File name fromFile = (filePath, fileName) # File address toFileName = changeSufix2Pdf(fileName) # Name of file generated toFile = toFileJoin(filePath, toFileName) # Addresses of documents generated print("Conversion:" + fileName + "In the file...") try: ppt = (fromFile, WithWindow=False) if > 0: (toFile, 32) # If empty, a box will pop up (no solution found for now) print("Convert to:" + toFileName + "Documentation completed.") else: print("(Error, unexpected: this file is empty, skip this file)") except Exception as e: print(e) # Close the PPT process print("All PPT files have been printed.") print("End PowerPoint process in... \n") () ppt = None () powerpoint = None except Exception as e: print(e) finally: () # Modify the suffix def changeSufix2Pdf(file): return file[:('.')] + ".pdf" # Add workbook serial numbers def addWorksheetsOrder(file, i): return file[:('.')] + "_worksheet" + str(i) + ".pdf" # Converted address def toFileJoin(filePath, file): return (filePath, 'pdf', file[:('.')] + ".pdf") # Start the program print("====================Program start====================") print( "[Program Function] will be the target path within all the ppt, excel, word are generated a corresponding PDF file, exists in the newly generated pdf folder (need to have installed office, excluding sub-folders)") print( "Note: If a PPT and Excel file is empty, there will be an error to skip this file. If the conversion of PPT time is too long, please check whether there is an error window waiting for confirmation, temporarily unable to completely solve the PPT window problem (empty error has been solved). In the process of closing the process, the time may be longer, about ten seconds, please be patient.") # Path to the file to be converted # filePath = input("Enter the target path: (if it is the current path: " + () + ", please enter directly)\n") filePath = "D:\Pycharmproject2023\code_test_project\shan_test\data" # Destination path, or current path if no path is entered if (filePath == ""): filePath = () # Categorize all files in the destination folder and open only one process for conversion words = [] ppts = [] excels = [] for fn in (filePath): if (('.doc', 'docx')): (fn) if (('.ppt', 'pptx')): (fn) if (('.xls', 'xlsx')): (fn) # Calling methods print("====================Start conversion====================") # Save the path: a new pdf folder, all generated PDF files are placed in it! folder = filePath + '\\pdf\\' if not (folder): (folder) word2Pdf(filePath, words) excel2Pdf(filePath, excels) ppt2Pdf(filePath, ppts) print("====================End of conversion====================") print("\n====================End of program====================") # ("pause")
III. Practical effects
Above is Python batch realization of Word/EXCEL/PPT to PDF details, more about python word excel ppt to pdf information please pay attention to my other related articles!