Real-time document scanning and correction system using Python and OpenCV

1. System Overview

This system mainly implements the following functions:

Live camera capture images
Edge detection and contour search
Document outline recognition
Perspective transformation correction document
Binarized processing enhances readability

2. Core code analysis

1. Import the necessary libraries

import numpy as np
import cv2

We mainly use NumPy for numerical calculations and OpenCV for image processing.

2. Helper function definition

First, a simple image display function is defined for easy debugging:

def cv_show(name,img):
    (name,img)
    (10)

3. Coordinate point sorting function

order_pointsThe function is used to arrange the four corners of the detected document in order (upper left, upper right, lower right, lower left):

def order_points(pts):
    rect = ((4,2),dtype="float32")
    s = (axis=1)
    rect[0] = pts[(s)]  # upper left point (x+y minimum)    rect[2] = pts[(s)]  # Lower right point (x+y maximum)    diff = (pts,axis=1)
    rect[1] = pts[(diff)]  # upper right dot (y-x minimum)    rect[3] = pts[(diff)]  # Lower left point (y-x maximum)    return rect

The function of this function is to sort the given 4 two-dimensional coordinate points to make them followTop left, top right, bottom right, bottom leftOrder of the order. This is very important in applications such as document scanning, image correction, etc., because we need to know the exact location of each corner point in order to perform perspective transformation correctly.

Detailed analysis of functions

(1) Sorting logic description

Top left dot (rect[0]):chooseThe value of x+y is the smallestThe point
- Because the upper left corner has small values of x and y in the coordinate system, the addition result is the smallest
Lower right dot (rect[2]):chooseThe maximum value of x+yThe point
- Because the x and y values in the coordinate system are large, the sum result is the largest
Top right dot (rect[1]):chooseThe y-x value is smallestThe point
- The upper right corner features that y is relatively small and x is relatively large, so the y-x value is the smallest
Lower left point (rect[3]):chooseThe maximum y-x valueThe point
- The characteristics of the lower left corner are that y is relatively large and x is relatively small, so the y-x value is the largest

(2) Example

Suppose there are 4 points:

	A(10, 20)  # Assume it is the upper left	B(50, 20)  # Top right	C(50, 60)  # Lower right	D(10, 60)  # Lower left

Calculation process:

x+y value: [30, 70, 110, 70]
- Minimum 30 → A (top left)
- Maximum 110 → C (bottom right)
y-x value: [10, -30, 10, 50]
- Minimum -30 → B (top right)
- Maximum 50 → D (bottom left)

Final sorting result: [A, B, C, D] i.e. [upper left, upper right, lower right, lower left]

(3) Why this method works

This method takes advantage of the geometric characteristics of two-dimensional coordinate points:

In a standard coordinate system, the x and y values in the upper left corner are smaller
The x and y values in the lower right corner are large
The x in the upper right corner is larger and y is smaller
The x in the lower left corner is smaller and y is larger

Through simple addition and subtraction operations, each corner can be reliably distinguished, without complex geometric calculations.

4. Perspective Transform Function

four_point_transformThe function implements the core functions of document correction:

def four_point_transform(image,pts):
    rect = order_points(pts)
    (tl,tr,br,bl) = rect
    
    # Calculate the transformed width and height    widthA = (((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2))
    widthB = (((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2))
    maxWidth = max(int(widthA),int(widthB))
    
    heightA = (((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2))
    heightB = (((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2))
    maxHeight = max(int(heightA),int(heightB))
    
    # Define the coordinates of the target image    dst = ([[0,0],[maxWidth - 1,0],
                    [maxWidth - 1,maxHeight - 1],[0,maxHeight - 1]],dtype="float32")

    # Calculate the perspective transformation matrix and apply it    M = (rect,dst)
    warped = (image,M,(maxWidth,maxHeight))
    
    return warped

This function is implementedPerspective transformation(Perspective Transformation) is used to correct any quadrilateral area in the image to a rectangle (i.e., "de-perspective" effect).

Detailed analysis of functions

Enter parameters

def four_point_transform(image, pts):

image: Original image
pts: An array containing 4 points, representing the quadrilateral area to be converted

Sort coordinate points

rect = order_points(pts)
(tl, tr, br, bl) = rect  # Decompose into top-left, top-right, bottom-right, bottom-left

Use the previously introducedorder_pointsFunctions sort 4 points in order

Calculate the width of the output image

widthA = (((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2))  # Base side lengthwidthB = (((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2))  # Top edge lengthmaxWidth = max(int(widthA), int(widthB))  # Take the maximum value as the output image width

Calculate the side lengths of the bottom and top of the quadrilateral and select the longer one as the output width

Calculate the height of the output image

heightA = (((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2))  # Right heightheightB = (((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2))  # Left heightmaxHeight = max(int(heightA), int(heightB))  # Take the maximum value as the output image height

Calculate the lengths of the right and left sides of the quadrilateral and select the longer one as the output height

Define the target rectangle coordinates

dst = ([
    [0, 0],  # Top left    [maxWidth - 1, 0],  # Top right    [maxWidth - 1, maxHeight - 1],  # Lower right    [0, maxHeight - 1]  # Lower left], dtype="float32")

Define the corner coordinates of the transformed rectangle (rectangle starting from (0,0))

Calculate the perspective transformation matrix and apply it

M = (rect, dst)  # Calculate the transformation matrixwarped = (image, M, (maxWidth, maxHeight))  # Apply Transformation

getPerspectiveTransform: Calculate the 3x3 transformation matrix from the original quadrilateral to the target rectangle
warpPerspective: Apply this transformation matrix to the original image

Return result

return warped

Returns the corrected rectangular image

Perspective transformation schematic diagram

Quadrilaterals in the original image               The transformed rectangle
   tl--------tr                    0--------maxWidth
    \        /                      |        |
     \      /                       |        |
      bl----br                       maxHeight

Why do we need to calculate width and height like this?

Reason for getting the maximum value：

The original quadrilateral may have perspective deformation, and the two opposing sides may be of different lengths.
Selecting a larger value ensures that everything can be included in the output image

Reasons for minus 1：

The image coordinates start from 0, so the maximum x coordinate of the image with a width of maxWidth is maxWidth-1

5. Main program flow

The main program implements the complete process of real-time document detection and correction:

Initialize the camera

cap = (0)
if not ():
    print("Cannot open camera")
    exit()

Real-time processing loop

while True:
    flag = 0
    ret,image = ()
    orig = ()
    if not ret:
        print("Camel cannot be read")
        break

Image preprocessing

gray = (image,cv2.COLOR_BGR2GRAY)
gray = (gray,(5,5),0)  # Gaussian filtering noise reductionedged = (gray,75,200)  # Canny edge detection

Contour detection and filtering

cnts = (edged,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)[-2]
cnts = sorted(cnts,key=,reverse=True)[:3]  # Take the 3 contours with the largest area
for c in cnts:
    peri = (c,True)  # Calculate the perimeter of the outline    approx = (c,0.05 * peri,True)  # Polygonal approximation    area = (approx)
    
    # Filter outlines with quadrilaterals with large enough area    if area &gt; 20000 and len(approx) == 4:
        screenCnt = approx
        flag = 1
        break

Document correction and display

if flag == 1:
    # Draw outlines    image_contours = (image,[screenCnt],0,(0,255,0),2)
    
    # Perspective Transformation    warped = four_point_transform(orig,(4,2))
    
    # Binary processing    warped = (warped,cv2.COLOR_BGR2GRAY)
    ref = (warped,0,255,cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]

3. Complete code

# Import toolkitimport numpy as np
import cv2

def cv_show(name,img):
    (name,img)
    (10)
def order_points(pts):
    # There are 4 coordinate points in total    rect = ((4,2),dtype="float32") # Used to store coordinate positions after sorting    # Find the corresponding coordinates 0123 in order: upper left, upper right, lower right, lower left    s = (axis=1) #Summarize each row of the pts matrix, (x+y)    rect[0] = pts[(s)]
    rect[2] = pts[(s)]
    diff = (pts,axis=1) #Dissue the difference operation on each row of the pts matrix, (y-x)    rect[1] = pts[(diff)]
    rect[3] = pts[(diff)]
    return rect

def four_point_transform(image,pts):
    # Get input coordinate points    rect = order_points(pts)
    (tl,tr,br,bl) = rect
    # Calculate the w and h values of the input    widthA = (((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2))
    widthB = (((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2))
    maxWidth = max(int(widthA),int(widthB))
    heightA = (((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2))
    heightB = (((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2))
    maxHeight = max(int(heightA),int(heightB))
    # The corresponding coordinate position after transformation    dst = ([[0,0],[maxWidth - 1,0],
                    [maxWidth - 1,maxHeight - 1],[0,maxHeight - 1]],dtype="float32")

    M = (rect,dst)
    warped = (image,M,(maxWidth,maxHeight))
    # Return the transformed result    return warped


# Read inputimport cv2
cap = (0)  # Make sure the camera is enabledif not ():   #Open failed    print("Cannot open camera")
    exit()

while True:
    flag = 0 # Used for standard timing Whether the document is currently detected    ret,image = ()  # If the frame is read correctly, ret is True    orig = ()
    if not ret: #Read failed, exit the loop        print("Camel cannot be read")
        break
    cv_show("image",image)

    gray = (image,cv2.COLOR_BGR2GRAY)
    # Preprocessing    gray = (gray,(5,5),0) # Gaussian filtering    edged = (gray,75,200)
    cv_show('1',edged)

    # Contour detection    cnts = (edged,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)[-2]

    cnts = sorted(cnts,key=,reverse=True)[:3]
    image_contours = (image,cnts,-1,(0,255,0),2)
    cv_show("image_contours",image_contours)
    # traverse the outline    for c in cnts:
        # Calculate contour approximation        peri = (c,True) # Calculate the perimeter of the outline        # C indicates the input point set        #epsilon represents the maximum distance from the original contour to the approximate contour, which is an accuracy parameter        # True means closed        approx = (c,0.05 * peri,True) # Contour approximation        area = (approx)
        # Take it out at 4 points        if area &gt; 20000 and len(approx) == 4:
            screenCnt = approx
            flag = 1
            print(peri,area)
            print("Document detected")
            break
    if flag == 1:
        # Show results        # print("STEP 2: Get outline")        image_contours = (image,[screenCnt],0,(0,255,0),2)
        cv_show("image",image_contours)
        # Perspective Transformation        warped = four_point_transform(orig,(4,2))
        cv_show("warped",warped)
        #Binary processing        warped = (warped,cv2.COLOR_BGR2GRAY)
        # ref = (warped,220,255,cv2.THRESH_BINARY)[1]
        ref = (warped,0,255,cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
        cv_show("ref",ref)
() # Release the capture() #Close the image window

4. Conclusion

This article introduces a real-time document scanning and correction system based on OpenCV. Through edge detection, contour analysis and perspective transformation, automatic document detection and correction is realized. The system can be easily applied to daily document digital work to improve work efficiency.

The complete code has been given above and readers can modify and extend according to their needs. OpenCV provides powerful image processing capabilities, combined with Python's concise syntax, making it simple and efficient to develop such a practical system.

The above is the detailed content of using Python and OpenCV to implement real-time document scanning and correction systems. For more information about Python OpenCV document scanning and correction, please pay attention to my other related articles!