1. System Overview
This system mainly implements the following functions:
- Live camera capture images
- Edge detection and contour search
- Document outline recognition
- Perspective transformation correction document
- Binarized processing enhances readability
2. Core code analysis
1. Import the necessary libraries
import numpy as np import cv2
We mainly use NumPy for numerical calculations and OpenCV for image processing.
2. Helper function definition
First, a simple image display function is defined for easy debugging:
def cv_show(name,img): (name,img) (10)
3. Coordinate point sorting function
order_points
The function is used to arrange the four corners of the detected document in order (upper left, upper right, lower right, lower left):
def order_points(pts): rect = ((4,2),dtype="float32") s = (axis=1) rect[0] = pts[(s)] # upper left point (x+y minimum) rect[2] = pts[(s)] # Lower right point (x+y maximum) diff = (pts,axis=1) rect[1] = pts[(diff)] # upper right dot (y-x minimum) rect[3] = pts[(diff)] # Lower left point (y-x maximum) return rect
The function of this function is to sort the given 4 two-dimensional coordinate points to make them followTop left, top right, bottom right, bottom leftOrder of the order. This is very important in applications such as document scanning, image correction, etc., because we need to know the exact location of each corner point in order to perform perspective transformation correctly.
Detailed analysis of functions
(1) Sorting logic description
-
Top left dot (rect[0]):chooseThe value of x+y is the smallestThe point
- Because the upper left corner has small values of x and y in the coordinate system, the addition result is the smallest
-
Lower right dot (rect[2]):chooseThe maximum value of x+yThe point
- Because the x and y values in the coordinate system are large, the sum result is the largest
-
Top right dot (rect[1]):chooseThe y-x value is smallestThe point
- The upper right corner features that y is relatively small and x is relatively large, so the y-x value is the smallest
-
Lower left point (rect[3]):chooseThe maximum y-x valueThe point
- The characteristics of the lower left corner are that y is relatively large and x is relatively small, so the y-x value is the largest
(2) Example
Suppose there are 4 points:
A(10, 20) # Assume it is the upper left B(50, 20) # Top right C(50, 60) # Lower right D(10, 60) # Lower left
Calculation process:
-
x+y value: [30, 70, 110, 70]
- Minimum 30 → A (top left)
- Maximum 110 → C (bottom right)
-
y-x value: [10, -30, 10, 50]
- Minimum -30 → B (top right)
- Maximum 50 → D (bottom left)
Final sorting result: [A, B, C, D] i.e. [upper left, upper right, lower right, lower left]
(3) Why this method works
This method takes advantage of the geometric characteristics of two-dimensional coordinate points:
- In a standard coordinate system, the x and y values in the upper left corner are smaller
- The x and y values in the lower right corner are large
- The x in the upper right corner is larger and y is smaller
- The x in the lower left corner is smaller and y is larger
Through simple addition and subtraction operations, each corner can be reliably distinguished, without complex geometric calculations.
4. Perspective Transform Function
four_point_transform
The function implements the core functions of document correction:
def four_point_transform(image,pts): rect = order_points(pts) (tl,tr,br,bl) = rect # Calculate the transformed width and height widthA = (((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2)) widthB = (((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2)) maxWidth = max(int(widthA),int(widthB)) heightA = (((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2)) heightB = (((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2)) maxHeight = max(int(heightA),int(heightB)) # Define the coordinates of the target image dst = ([[0,0],[maxWidth - 1,0], [maxWidth - 1,maxHeight - 1],[0,maxHeight - 1]],dtype="float32") # Calculate the perspective transformation matrix and apply it M = (rect,dst) warped = (image,M,(maxWidth,maxHeight)) return warped
This function is implementedPerspective transformation(Perspective Transformation) is used to correct any quadrilateral area in the image to a rectangle (i.e., "de-perspective" effect).
Detailed analysis of functions
- Enter parameters
def four_point_transform(image, pts):
-
image
: Original image -
pts
: An array containing 4 points, representing the quadrilateral area to be converted
- Sort coordinate points
rect = order_points(pts) (tl, tr, br, bl) = rect # Decompose into top-left, top-right, bottom-right, bottom-left
Use the previously introducedorder_points
Functions sort 4 points in order
- Calculate the width of the output image
widthA = (((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2)) # Base side lengthwidthB = (((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2)) # Top edge lengthmaxWidth = max(int(widthA), int(widthB)) # Take the maximum value as the output image width
Calculate the side lengths of the bottom and top of the quadrilateral and select the longer one as the output width
- Calculate the height of the output image
heightA = (((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2)) # Right heightheightB = (((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2)) # Left heightmaxHeight = max(int(heightA), int(heightB)) # Take the maximum value as the output image height
Calculate the lengths of the right and left sides of the quadrilateral and select the longer one as the output height
- Define the target rectangle coordinates
dst = ([ [0, 0], # Top left [maxWidth - 1, 0], # Top right [maxWidth - 1, maxHeight - 1], # Lower right [0, maxHeight - 1] # Lower left], dtype="float32")
Define the corner coordinates of the transformed rectangle (rectangle starting from (0,0))
- Calculate the perspective transformation matrix and apply it
M = (rect, dst) # Calculate the transformation matrixwarped = (image, M, (maxWidth, maxHeight)) # Apply Transformation
-
getPerspectiveTransform
: Calculate the 3x3 transformation matrix from the original quadrilateral to the target rectangle -
warpPerspective
: Apply this transformation matrix to the original image
- Return result
return warped
Returns the corrected rectangular image
- Perspective transformation schematic diagram
Quadrilaterals in the original image The transformed rectangle tl--------tr 0--------maxWidth \ / | | \ / | | bl----br maxHeight
- Why do we need to calculate width and height like this?
Reason for getting the maximum value:
- The original quadrilateral may have perspective deformation, and the two opposing sides may be of different lengths.
- Selecting a larger value ensures that everything can be included in the output image
Reasons for minus 1:
- The image coordinates start from 0, so the maximum x coordinate of the image with a width of maxWidth is maxWidth-1
5. Main program flow
The main program implements the complete process of real-time document detection and correction:
- Initialize the camera
cap = (0) if not (): print("Cannot open camera") exit()
- Real-time processing loop
while True: flag = 0 ret,image = () orig = () if not ret: print("Camel cannot be read") break
- Image preprocessing
gray = (image,cv2.COLOR_BGR2GRAY) gray = (gray,(5,5),0) # Gaussian filtering noise reductionedged = (gray,75,200) # Canny edge detection
- Contour detection and filtering
cnts = (edged,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)[-2] cnts = sorted(cnts,key=,reverse=True)[:3] # Take the 3 contours with the largest area for c in cnts: peri = (c,True) # Calculate the perimeter of the outline approx = (c,0.05 * peri,True) # Polygonal approximation area = (approx) # Filter outlines with quadrilaterals with large enough area if area > 20000 and len(approx) == 4: screenCnt = approx flag = 1 break
- Document correction and display
if flag == 1: # Draw outlines image_contours = (image,[screenCnt],0,(0,255,0),2) # Perspective Transformation warped = four_point_transform(orig,(4,2)) # Binary processing warped = (warped,cv2.COLOR_BGR2GRAY) ref = (warped,0,255,cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
3. Complete code
# Import toolkitimport numpy as np import cv2 def cv_show(name,img): (name,img) (10) def order_points(pts): # There are 4 coordinate points in total rect = ((4,2),dtype="float32") # Used to store coordinate positions after sorting # Find the corresponding coordinates 0123 in order: upper left, upper right, lower right, lower left s = (axis=1) #Summarize each row of the pts matrix, (x+y) rect[0] = pts[(s)] rect[2] = pts[(s)] diff = (pts,axis=1) #Dissue the difference operation on each row of the pts matrix, (y-x) rect[1] = pts[(diff)] rect[3] = pts[(diff)] return rect def four_point_transform(image,pts): # Get input coordinate points rect = order_points(pts) (tl,tr,br,bl) = rect # Calculate the w and h values of the input widthA = (((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2)) widthB = (((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2)) maxWidth = max(int(widthA),int(widthB)) heightA = (((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2)) heightB = (((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2)) maxHeight = max(int(heightA),int(heightB)) # The corresponding coordinate position after transformation dst = ([[0,0],[maxWidth - 1,0], [maxWidth - 1,maxHeight - 1],[0,maxHeight - 1]],dtype="float32") M = (rect,dst) warped = (image,M,(maxWidth,maxHeight)) # Return the transformed result return warped # Read inputimport cv2 cap = (0) # Make sure the camera is enabledif not (): #Open failed print("Cannot open camera") exit() while True: flag = 0 # Used for standard timing Whether the document is currently detected ret,image = () # If the frame is read correctly, ret is True orig = () if not ret: #Read failed, exit the loop print("Camel cannot be read") break cv_show("image",image) gray = (image,cv2.COLOR_BGR2GRAY) # Preprocessing gray = (gray,(5,5),0) # Gaussian filtering edged = (gray,75,200) cv_show('1',edged) # Contour detection cnts = (edged,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)[-2] cnts = sorted(cnts,key=,reverse=True)[:3] image_contours = (image,cnts,-1,(0,255,0),2) cv_show("image_contours",image_contours) # traverse the outline for c in cnts: # Calculate contour approximation peri = (c,True) # Calculate the perimeter of the outline # C indicates the input point set #epsilon represents the maximum distance from the original contour to the approximate contour, which is an accuracy parameter # True means closed approx = (c,0.05 * peri,True) # Contour approximation area = (approx) # Take it out at 4 points if area > 20000 and len(approx) == 4: screenCnt = approx flag = 1 print(peri,area) print("Document detected") break if flag == 1: # Show results # print("STEP 2: Get outline") image_contours = (image,[screenCnt],0,(0,255,0),2) cv_show("image",image_contours) # Perspective Transformation warped = four_point_transform(orig,(4,2)) cv_show("warped",warped) #Binary processing warped = (warped,cv2.COLOR_BGR2GRAY) # ref = (warped,220,255,cv2.THRESH_BINARY)[1] ref = (warped,0,255,cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1] cv_show("ref",ref) () # Release the capture() #Close the image window
4. Conclusion
This article introduces a real-time document scanning and correction system based on OpenCV. Through edge detection, contour analysis and perspective transformation, automatic document detection and correction is realized. The system can be easily applied to daily document digital work to improve work efficiency.
The complete code has been given above and readers can modify and extend according to their needs. OpenCV provides powerful image processing capabilities, combined with Python's concise syntax, making it simple and efficient to develop such a practical system.
The above is the detailed content of using Python and OpenCV to implement real-time document scanning and correction systems. For more information about Python OpenCV document scanning and correction, please pay attention to my other related articles!