Python provides a multitude of PDF support library, this article is in the Python3 environment, try two libraries to complete the generation of PDF features. PyPDF for reading PDF support is better, but did not find a way to generate multi-layer PDF. Reportlab looks more mature, the use of Canvas to generate multi-layer PDF is very convenient, so that you can realize the content of the image scanned up can also be content search goals.
Reportlab
Generate double-layer PDF
bilayerPDFappliancePDFhit the nail on the headCanvasconceptual,Draw the text first,Finally, paint the picture.,That's two layers.PDF。 import os # import urllib2 import time from reportlab import platypus from import letter from import inch from import SimpleDocTemplate, Image from import canvas image_file = "./" # Use Canvas to generate pdf c = ('reportlab_canvas.pdf', pagesize=letter) width, height = letter (0,0.77,0.77) # say hello (note after rotate the y coord needs to be negative!) ( 3*inch, 3*inch, "Hello World") (image_file, 0 , 0) () ()
PyPDF2
Read PDF
from PyPDF2 import PdfFileWriter, PdfFileReader output = PdfFileWriter() input1 = PdfFileReader(open("", "rb")) # print document info print(()) # print how many pages input1 has: print ("pdf_document.pdf has %d pages." % ()) # print page content page_content = (0).extractText() print( page_content ) # add page 1 from input1 to output document, unchanged ((0)) # add page 2 from input1, but rotated clockwise 90 degrees ((1).rotateClockwise(90)) # finally, write "output" to outputStream = open("", "wb") (outputStream)
But PyPDF has a lot of problems getting PDF content, you can see thisquestionnaire. It is also described in the documentation.
| extractText(self) | ## | # Locate all text drawing commands, in the order they are provided in the | # content stream, and extract the text. This works well for some PDF | # files, but poorly for others, depending on the generator used. This will | # be refined in the future. Do not rely on the order of text coming out of | # this function, as it will change if this function is made more | # sophisticated. | # | # Stability: Added in v1.7, will exist for all future releases. May | # be overhauled to provide more ordered text in the future. | # @return a unicode string object
This is the whole content of this article.