SoFunction
Updated on 2024-11-13

Python processing PDF and generate multi-layer PDF example code

Python provides a multitude of PDF support library, this article is in the Python3 environment, try two libraries to complete the generation of PDF features. PyPDF for reading PDF support is better, but did not find a way to generate multi-layer PDF. Reportlab looks more mature, the use of Canvas to generate multi-layer PDF is very convenient, so that you can realize the content of the image scanned up can also be content search goals.

Reportlab

Generate double-layer PDF

bilayerPDFappliancePDFhit the nail on the headCanvasconceptual,Draw the text first,Finally, paint the picture.,That's two layers.PDF。

import os
# import urllib2
import time
from reportlab import platypus
from  import letter
from  import inch
from  import SimpleDocTemplate, Image
from  import canvas

image_file = "./"

# Use Canvas to generate pdf
c = ('reportlab_canvas.pdf', pagesize=letter)
width, height = letter

(0,0.77,0.77)
# say hello (note after rotate the y coord needs to be negative!)
( 3*inch, 3*inch, "Hello World")
(image_file, 0 , 0)
()
()

PyPDF2

Read PDF

from PyPDF2 import PdfFileWriter, PdfFileReader

output = PdfFileWriter()
input1 = PdfFileReader(open("", "rb"))

# print document info
print(())

# print how many pages input1 has:
print ("pdf_document.pdf has %d pages." % ())

# print page content
page_content = (0).extractText()
print( page_content )

# add page 1 from input1 to output document, unchanged
((0))

# add page 2 from input1, but rotated clockwise 90 degrees
((1).rotateClockwise(90))

# finally, write "output" to 
outputStream = open("", "wb")
(outputStream)

But PyPDF has a lot of problems getting PDF content, you can see thisquestionnaire. It is also described in the documentation.

| extractText(self) | ## | # Locate all text drawing commands, in the order they are provided in the | # content stream, and extract the text. This works well for some PDF | # files, but poorly for others, depending on the generator used. This will | # be refined in the future. Do not rely on the order of text coming out of | # this function, as it will change if this function is made more | # sophisticated. | #
 | # Stability: Added in v1.7, will exist for all future  releases. May | # be overhauled to provide more ordered text in the future. | # @return a unicode string object

This is the whole content of this article.