SoFunction
Updated on 2024-12-13

OpenCV real-time parking lot space detection project practice

1. Write first

Today finishing OpenCV introductory third practical small projects, the previous two articles finishing the credit card digital recognition and document OCR scanning, most of the basic image preprocessing techniques used in OpenCV, such as contour detection, edge detection, morphological operations, perspective transformations, and so on, and this article of the project, not only need some basic image preprocessing, but also need to build a model recognition and prediction. This project not only requires some basic image preprocessing, but also needs to build models for recognition and prediction, so through this project, we can pull up a whole set of processes, such as image preprocessing and modeling, and apply them to practical application scenarios, which is still very interesting.

The task of detecting parking spaces in real time in a parking lot is to get a videovideo of the parking lot and accomplish two main things:

  • Detects how many cars are currently in the parking lot and how many spaces are available.
  • Marking the free parking spaces so that users can go directly to the free spaces when parking saves a lot of time.

So this project is still very valuable for practical application, it took about a day and a half to get this project done, the reference is Mr. Tang'sOpenCV Getting Started Tutorial VideoHowever, this is a relatively rough job for this task, and I've made some optimizations based on my understanding of it, with the following major changes:

  • In terms of data processing, after the parking spaces were boxed out by column, I manually adjusted the coordinates of each column box to ensure that each parking space was not missing or redundant, and then fine-tuned the coordinates of the position of each parking space to try to make the marking more accurate.
  • Model, the original video using migration learning, based on keras on the VGG network fine-tuning, and my model here unified based on pytorch, with ResNet32 pre-training model for finetune, validation set of the correct rate can be up to more than 0.94, but the first version is still a small number of predictions are not very accurate, so again based on the existing frame image to do the data enhancement. We added some extra data to increase the accuracy to about 0.98.
  • The overall structure of the project all change, is to listen to the above ideas, and then based on their own understanding of the refactoring, the benefit is that the back can be optimized, according to their own needs to do data enhancement, data preprocessing and training of a variety of advanced models.

However, it was found that the small resnet was powerful enough, and the final predictions were as follows:

在这里插入图片描述

This is an image of one of the frames in the video, and when it actually runs, it reads in the video, quickly separates the frames, makes a prediction mark like this for each frame, and then displays it in real time. This way, at each moment, you know dynamically what spaces are empty in that parking lot.

The following is the organization of the key techniques used in this project, because this project is slightly larger, the amount of code, it is not possible to show all here, but want to record for this project my thought process, as well as the motivation of the various treatments, and how to carry out the treatment, I think this is the only thing that will be useful for the future.

2. Overall process organization

First of all, once you have the task, you have to go through the process in order to determine the course of action. We started with a video like this one, so in order to accomplish the above task of parking space detection as well as recognition, two steps need to be considered:

  • I'll have to extract every parking space in the lot.
  • With each parking space, I train a model to predict whether there is a car in the space, identify the ones that don't have a car, and count the number of cars.

It's really just two big steps on a macro level. So the later question is how to extract each parking space and how to train the model to predict it?

I've divided this into two main steps, data preprocessing and model training and prediction:

Data pre-processing aspects

  • Taking the image of a particular frame in the video as a unit of processing
  • Through binarization, grayscaling, edge detection, specific point calibration connectivity, etc., the redundant part of the image is removed, and only this part of the object in the parking lot is retained.
  • Hough transform of the line detection, go to find the line in the picture, according to the coordinates of the line, first by column unit, the parking space by column frame, and then manually fine-tune the frame
  • In each column, lock the location of each parking space and label each one, saving this as a dictionary
  • With the location of each parking space, the corresponding image can be extracted, which can be used as a dataset for the training as well as validation of the model later on, although it needs to be manually divided manually

Through the above steps, some data will be accumulated, about 800 images, and the next step is to train the model, but due to the small amount of data, training the model from scratch is often ineffective, so here we use migration learning, using the pre-trained resnet34, fine-tuned with these 800 images.

Having trained the model to save, next, for each frame of the image, with a dictionary of parking locations, it is straightforward to extract each parking space, and then for each of these parking spaces, the model predicts whether there is a car or not

So with such a process, it can be further broken down and refined, you can think big and start small, the following key details inside the organization of each step.

3. Data pre-processing

3.1 Background filtering

First, a frame is read in and the original image is as follows:

在这里插入图片描述

The background is first filtered out by binarization to highlight the important information, and then converted to a grayscale image.

def select_rgb_white_yellow(image):
    # Filtered backgrounds
    lower = np.uint8([120, 120, 120])
    upper = np.uint8([255, 255, 255])
    # In each of the three channels, the value below lower and above upper becomes 0, and the value between lower-upper becomes 255, which is equivalent to mask, filtering the background.
    # Pixel values between 120-255 are retained
    white_mask = (image, lower, upper)
    masked_img = cv2.bitwise_and(image, image, mask=white_mask)
    return masked_img
masked_img = select_rgb_white_yellow(test_image)

See hereinRange(),Thinking about the binarization method used beforethreshold, I wonder what the difference is.? Why don't we use this here?? Here are a few things I've learned from exploring the use of:

  • (src, thresh, maxval, type[, dst]): for single-channel images (grayscale maps), the criterion for binarization, thetype=THRESH_BINARY: if x > thresh, x = maxval, else x = 0, andtype=THRESH_BINARY_INV: The opposite of the above standard, these two are commonly used at the moment
  • (src, lowerb, upperb): it can be a single-channel image, it can be a three-channel image, or it can be binarized, and the criteria areif x >= lower and x <= upper, x = 255, else x = 0

Here's an experiment, converting the image to grayscale beforehandwarped = (test_image, cv2.COLOR_BGR2GRAY)Then the following two lines of code are executed with the same result.

  • (warped, 119, 255, cv2.THRESH_BINARY)[1]
  • (warped, 120, 255)

The image looks like this after processing:

在这里插入图片描述

3.2 Canny Edge Detection

Next, the Canny edge detection algorithm is used to detect edges to

low_threshold, high_threshold = 50, 200
edges_img = (gray_img, low_threshold, high_threshold)

The results are as follows.

在这里插入图片描述

Here's an attempt to extract the parking lot piece and get rid of the rest of the useless stuff.

3.3 Parking lot area extraction

The idea here is to fix the corners of the parking lot with 6 calibration points, which you have to find yourself. Once you have found them, you can use the OpenCV fill function to create a mask matrix, and then you can remove the rest.

def select_region(image):
    """Manually select the area here."""
    rows, cols = [:2]
    
    # 6 calibration points are defined below, the order of the points must be such that it turns into a region, if adjusted, it may cross over, so don't move it.
    pt_1  = [cols*0.06, rows*0.90]   # Bottom left
    pt_2 = [cols*0.06, rows*0.70]    # Upper left
    pt_3 = [cols*0.32, rows*0.51]    # Center-left
    pt_4 = [cols*0.6, rows*0.1]      # Center-right
    pt_5 = [cols*0.90, rows*0.1]     # Upper right
    pt_6 = [cols*0.90, rows*0.90]    # Bottom right
    
    vertices = ([[pt_1, pt_2, pt_3, pt_4, pt_5, pt_6]], dtype=np.int32)
    point_img = ()
    point_img = (point_img, cv2.COLOR_GRAY2BGR)
    for point in vertices[0]:
        (point_img, (point[0], point[1]), 10, (0, 0, 255), 4)
    # cv_imshow('points_img', point_img)
    
    # Define the mask matrix, keeping only the areas inside the points.
    mask = np.zeros_like(image)
    if len() == 2:
        (mask, vertices, 255)   # Fill the boxed area with white
        #cv_imshow('mask', mask)
    roi_image = cv2.bitwise_and(image, mask)
    return roi_image

roi_image = select_region(edges_img)

The effect of the processing is as follows:

在这里插入图片描述

With that taken care of, we need to find the straight lines in here and then guess the general location by them.

3.4 Hough Transform Detection of Straight Lines

Here the Hough transform is used to detect the line, the function is, the function can detect the two endpoints of the line (x0,y0, x1, y1). Function prototype.

HoughLinesP(image, rho, theta, threshold[, lines[, minLineLength[, maxLineGap]]]) -> lines

  • image: the output image of the edge detection, note that it must be the output image of the edge detection.
  • rho: resolution of the parameter polar diameter r in pixel values, typically 1
  • threa: resolution in radians, typically 1
  • threshold: Minimum number of curve intersections required to detect a line
  • minLineLength: the minimum length of a line that can be formed, too short to be considered a line.
  • maxLineGap: the maximum distance between two lines, less than this value, it is considered as one line.

So, this function is used directly.

def hough_lines(image):
    # The input image needs to be the result of edge detection
    # minLineLengh (the shortest length of a line, anything shorter than this is ignored) and MaxLineCap (the maximum spacing between two lines, anything less than this is considered a straight line)
    # rho distance accuracy, theta angle accuracy, threshod exceeds the set threshold to be detected as a line segment.
    return (image, rho=0.1, theta=/10, threshold=15, minLineLength=9, maxLineGap=4)

list_of_lines = hough_lines(roi_image)  # (2338, 1, 4)

Surprisingly, 2338 straight lines were detected, which must be a lot of unusable, so the back of the processing, you need to screen a wave of straight lines first. Filtering principle is that the line can not be oblique, and the horizontal direction can not be too long or too short. Specific code will see below, here first show the effect of filtering.

在这里插入图片描述

After filtering, there are 628 straight lines in total.

3.5 Classification of parking spaces by classifications

The code below will be slightly more complex, so the ideas need to be covered in chunks.

First, we have the straight line of the parking lot and its coordinate position. With the filtering operation done, the next step is to sort each straight line. Let's have these lines, from column to column, in order from top to bottom.

def identity_blocks(image, lines, make_copy=True):
    if make_copy:
        new_image = ()
    
    # Filtered partially straight lines
    stayed_lines = []
    for line in lines:
        for x1, y1, x2, y2 in line:
            # This is a filter line, it must not be diagonal, and not too long or too short horizontally
            if abs(y2-y1) <=1 and abs(x2-x1) >=25 and abs(x2-x1) <= 55:
                stayed_lines.append((x1,y1,x2,y2))
    
    # Sort the lines by x1, so that the lines are organized from top to bottom, the order is from the first horizontal line of the first column, down, and then the first horizontal line of the second column down,...
    list1 = sorted(stayed_lines, key=(0, 1))

After the arrangement, traverse all the lines, look at the distance between two neighboring lines, if it is a column, then the x_1 of the two lines should be very close to each other, after all, it is the same column, if this value is too large, it means that it is the next column. According to this criterion, after traversal, the lines can be divided into different columns. Here a dictionary is used, with the keys representing the columns and the values representing the straight lines inside each column.

The code picks up:

	# Multiple columns found, equivalent to a row of cars per column
    clusters = (list)
    dIndex = 0
    clus_dist = 10   # That distance between each column
    for i in range(len(list1) - 1):
        # Look at the distance between two neighboring lines, if they are in a column, then x1 this distance should be very close, after all, it is on the same column
        # If this value is greater than 10, it means that it is the next column, you need to move dIndex, which indicates the first column.
        distance = abs(list1[i+1][0] - list1[i][0])
        if distance <= clus_dist:
            clusters[dIndex].append(list1[i])
            clusters[dIndex].append(list1[i+1])
        else:
            dIndex += 1

With a straight line inside each column, the following is is to traverse each column, first get all the straight lines, and then find the maximum and minimum value of the vertical coordinates, as well as the maximum and minimum value of the horizontal coordinates, but because of the horizontal coordinates here, the first and last columns of a row of parking spaces, the middle row are two columns, it is not good to take the largest and smallest coordinates directly, so the average of the way used here. After traversing this way, for each column, you can get the upper left corner point and the lower right corner point, which is a rectangular box.

The code picks up:

	# Get a rectangular box for each column of parking spaces
    rects = {}
    i = 0
    for key in clusters:
        all_list = clusters[key]
        cleaned = list(set(all_list))
        # With 5 parking spaces at least
        if len(cleaned) > 5:
            cleaned = sorted(cleaned, key=lambda tup: tup[1])
            avg_y1 = cleaned[0][1]
            avg_y2 = cleaned[-1][1]
            if abs(avg_y2-avg_y1) < 15:
                continue
            avg_x1 = 0
            avg_x2 = 0
            for tup in cleaned:
                avg_x1 += tup[0]
                avg_x2 += tup[2]
            avg_x1 = avg_x1 / len(cleaned)
            avg_x2 = avg_x2 / len(cleaned)
            
            rects[i] = [avg_x1, avg_y1, avg_x2, avg_y2]
            i += 1
    print('Num Parking Lanes: ', len(rects))

Below, draw the rectangular box:

	# Draw the column rectangle
    buff = 7
    for key in rects:
        tup_topLeft = (int(rects[key][0] - buff), int(rects[key][1]))
        tup_botRight = (int(rects[key][2] + buff), int(rects[key][3]))
        (new_image, tup_topLeft, tup_botRight, (0, 255, 0), 3)
    return new_image, rects

The buffs here, too, are manipulated with a bit of fine-tuning. This kind is based on the actual scene, not dead. The effect is as follows:

在这里插入图片描述

This reveals a rough rectangular box calibration for each column of parking spaces, but this is very rough. The original video inside is based on this going backwards. I'm fine-tuning the box for each column here, because this box is very important. Not allowing it affects the specific parking space delineation later on.

def rect_finetune(image, rects, copy_img=True):
    if copy_img:
        image_copy = ()
    # Here's what you need to do to fine-tune the coordinates of the box above, to make the box more accurate #
    # This box is important, it affects the count of parking spaces later, try not to miss it
    for k in rects:
        if k == 0:
            rects[k][1] -= 10
        elif k == 1:
            rects[k][1] -= 10
            rects[k][3] -= 10
        elif k == 2 or k == 3 or k == 5:
            rects[k][1] -= 4
            rects[k][3] += 13
        elif k == 6 or k == 8:
            rects[k][1] -= 18
            rects[k][3] += 12
        elif k == 9:
            rects[k][1] += 10
            rects[k][3] += 10
        elif k == 10:
            rects[k][1] += 45
        elif k == 11:
            rects[k][3] += 45
    
    buff = 8
    for key in rects:
        tup_topLeft = (int(rects[key][0]-buff), int(rects[key][1]))
        tup_botRight = (int(rects[key][2]+buff), int(rects[key][3]))
        (image_copy, tup_topLeft, tup_botRight, (0, 255, 0), 3)
    
    return image_copy, rects

The result after fine-tuning is as follows:

在这里插入图片描述

The principle is not to leave anything out, not to be redundant.

3.6 Locking each parking space

Here is for each rectangular box, for the parking spaces inside with a straight line cut into one by one, each parking space with (x1,y1,x2,y2) identifies the coordinates of the upper-left corner and the lower-right corner. And labeled, eventually forming a dictionary, the key of the dictionary is the location, the value is the serial number. Of course, one of the details here, is still the middle row is two rows, the first and last is a row, this in the specific division of parking space, must pay attention to.

def draw_parking(image, rects, make_copy=True, save=True):
    gap = 15.5
    spot_dict = {}  # A parking space corresponds to a location
    tot_spots = 0
    
    # Fine-tuning
    adj_x1 = {0: -8, 1:-15, 2:-15, 3:-15, 4:-15, 5:-15, 6:-15, 7:-15, 8:-10, 9:-10, 10:-10, 11:0}
    adj_x2 = {0: 0, 1: 15, 2:15, 3:15, 4:15, 5:15, 6:15, 7:15, 8:10, 9:10, 10:10, 11:0}
    fine_tune_y = {0: 4, 1: -2, 2: 3, 3: 1, 4: -3, 5: 1, 6: 5, 7: -3, 8: 0, 9: 5, 10: 4, 11: 0}
    
    for key in rects:
        tup = rects[key]
        x1 = int(tup[0] + adj_x1[key])
        x2 = int(tup[2] + adj_x2[key])
        y1 = int(tup[1])
        y2 = int(tup[3])
        (new_image, (x1, y1),(x2,y2),(0,255,0),2)
        
        num_splits = int(abs(y2-y1)//gap)
        for i in range(0, num_splits+1):
            y = int(y1+i*gap) + fine_tune_y[key]
            (new_image, (x1, y), (x2, y), (255, 0, 0), 2)
        if key > 0 and key < len(rects) - 1:
            # Vertical lines
            x = int((x1+x2) / 2)
            (new_image, (x, y), (x, y2), (0, 0, 255), 2)
        
        # Calculate the number Except for the first and last columns, everything in between is in two columns
        if key == 0 or key == len(rects) - 1:
            tot_spots += num_splits + 1
        else:
            tot_spots += 2 * (num_splits + 1)
        
        # Dictionaries correspond well
        if key == 0 or key == len(rects) - 1:
            for i in range(0, num_splits+1):
                cur_len = len(spot_dict)
                y = int(y1 + i * gap) + fine_tune_y[key]
                spot_dict[(x1, y, x2, y+gap)] = cur_len + 1
        else:
            for i in range(0, num_splits+1):
                cur_len = len(spot_dict)
                y = int(y1 + i * gap) + fine_tune_y[key]
                x = int((x1+x2) / 2)
                spot_dict[(x1, y, x, y+gap)] = cur_len + 1
                spot_dict[(x, y, x2, y+gap)] = cur_len + 2
  
    return new_image, spot_dict

The fine_tune_y here is also something I added later, also to make each column as accurate as possible in dividing the spaces.

在这里插入图片描述

From this effect, basically divided the parking spaces one by one, after the division, will find that some of these are not parking spaces, but still give the box. This counts the number of times, as well as the back to give information on the parking time will be affected, so I here and one by one to check, remove these invalid parking spaces.

# Remove excess parking spaces
invalid_spots = [10, 11, 33, 34, 37, 38, 61, 62, 93, 94, 95, 97, 98, 135, 137, 138, 187, 249, 
           250, 253, 254, 323, 324, 327, 328, 467, 468, 531, 532]
valid_spots_dict = {}
cur_idx = 1
for k, v in spot_dict.items():
    if v in invalid_spots:
        continue
    valid_spots_dict[k] = cur_idx
    cur_idx += 1

In this way, you can also visualize the processed parking space information, and then fine-tune it, but here I did not make any adjustments due to some previous fine-tuning operations, and I feel that the results are OK.

# Mark every validated parking space
tmp_img = test_image.copy()
for k, v in valid_spots_dict.items():
    (tmp_img, (int(k[0]), int(k[1])),(int(k[2]),int(k[3])), (0,255,0) , 2)
cv_imshow('valid_pot', tmp_img)

The effect is as follows:

在这里插入图片描述

If you want the model to predict each parking space more accurately, the division must be as detailed and standardized as possible. Otherwise, if the rectangular box does not correspond to the real parking space, for example, the rectangular box is stuck in the middle of the two parking spaces, so that the divided parking spaces are shown to the model, it is easy to judge the error.

Also, this final dictionary is important because this dictionary holds information about the location of each parking space. With this, getting a frame, you can directly calibrate each parking space and take it to the model for prediction. And for the same parking lot, this each parking space is fixed. So this also doesn't change, all the images of the video are shared. This ensures real-time.

3.7 Generating Predictive Images for CNNs

With the specific location information of each parking space, the following directly follows the left side of this to cut out each parking space to get the dataset for training and validation of the CNN later.

def save_images_for_cnn(image, spot_dict, folder_name = '../cnn_pred_data'):
    for spot in spot_dict.keys():
        (x1, y1, x2, y2) = spot
        (x1, y1, x2, y2) = (int(x1), int(y1), int(x2), int(y2))
        
        # Trimming
        spot_img = image[y1:y2, x1:x2]
        spot_img = (spot_img, (0, 0), fx=2.0, fy=2.0)
        spot_id = spot_dict[spot]
        
        filename = 'spot_{}.jpg'.format(str(spot_id))
        
        # print(spot_img.shape, filename, (x1,x2,y1,y2))
        ((folder_name, filename), spot_img)
  
save_images_for_cnn(test_image, valid_spots_dict)

In this way, the training dataset for the model is prepared. Organize it in a file like this:

在这里插入图片描述

Inside each directory, there is a small image of a parking space, but here it is artificially divided into whether there is a car or no car. So the later model actually does a binary classification task, given such a small image of a parking space, predict whether it is empty or not.

Let's get down to the details of the model.

4. Model training and prediction

Since the current samples are very small and not enough to train a large model to convergence, the pre-trained model is used here with the migration learning technique.

Model here and the video is not the same is that I uniformly use pytorch written model training and testing code, the reason is that recently is trying to pytorch reproduction cv inside the classical network, this project just let me take to practice. Another reason is that I feel that keras is not flexible enough, and it is not as convenient as transforms in torchvision in terms of data preprocessing. Based on these two reasons, I directly use pytorch here, using the resnet34 pre-training model, the reason for using this is that these two days happen to be the resnet reproduced again, a little familiar with just a little bit, so it is possible to learn to apply, there is no favoritism.

As the code here is very much, here is not too much to list, simply say the logic can be, interested in the specific project can look at.

The first step is to train the model.

4.1 Model training

This whole logic is something to look at:

def train_model():
    # Getting the dataloader
    data_root = ()
    image_path = (data_root, "train_data")
    train_data_path = (image_path, "train")
    val_data_path = (image_path, "test")
    train_loader, validat_loader, train_num, val_num = get_dataloader(train_data_path, val_data_path,
                                                                      data_transform_pretrain, batch_size=8)

    # Create the model Note that the number of classes is not specified here, the default is 1000 classes.
    net = resnet34()
    model_weight_path = 'saved_model_weight/resnet34_pretrain_ori_low_torch_version.pth'

    # Use pre-trained parameters, then finetune
    net.load_state_dict((model_weight_path, map_location='cpu'))

    # Change fc layer structure Change fc output dimension to 2
    in_channel = .in_features
     = (in_channel, 2)
    (device)

    # Model training configuration
    loss_function = ()
    optimizer = ((), lr=0.0001)

    epochs = 30
    save_path = "saved_model_weight/resnet34_pretrain.pth"
    best_acc = 0.
    train_steps = len(train_loader)

    model_train(net, train_loader, validat_loader, epochs, device, optimizer, loss_function, train_steps, val_num,
                save_path, best_acc)

Because I used some function encapsulation here, so this logic should be slightly clearer, first pytorch model training, to first encapsulate the data into the format of dataloader, later model training, is to read data from this class. On the principle of dataloader and dataset here is not too much organization. Before I detailed in the pytorch foundation there organized.

The detail here, though, is data_transform_pretrain, which is the data preprocessing operation.

data_transform_pretrain = {
        "train": ([
            (224),  # Randomly crop the image for the training set, not the validation set #
            (),
            (),
            # Here the centralized processing parameters need to be officially given parameters, here is the ImageNet image of the mean and variance of each channel, can not be arbitrarily specified!
            ((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))
        ]),
        "val": ([
            # The validation process has been changed a little bit here as well
            (256),
            (224),
            (),
            ((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))
        ]),
        "test": ([
            (256),
            (224),
            (),
            ((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))
        ])
    }

Here is the use of the official trained resnet network, we here centering to refer to the official parameters given, because it is pre-training is ImageNet this large dataset training, so here the mean and variance of each channel, we had better not arbitrarily specified. Use the official ones given by others.

With the dataloader in place, we create the model. Here we use resnet34 directly, and import the pre-trained model parameters into it. When importing, you'll notice that the file with the name of my parameter has alow_torch_version, because of the error reported during the previous import:

 is a zip archive(did you mean to use ()?)“

The reason for this reported error is that the official pre-training saved model parameters use pytorch version 1.6 or higher, and PyTorch version 1.6 will switch to use the new zipfile based file format.

still retains the ability to load files in the old format. If you want theTo use the old format, passkwarg _use_new_zipfile_serialization = False

The pytorch version of my computer notebook is 1.0, so importing model parameters saved in version 1.6 or higher reports this error. So, how did I fix it? Well, it was from my server, by running the following code

model_weight_path = "saved_models/resnet34_pretrain_ori.pth"
    state_dict = (model_weight_path)
    (state_dict, 'saved_models/resnet34_pretrain_ori_low_torch_version.pth', _use_new_zipfile_serialization=False)

The version of pytorch on my server is version 1.10, and it is possible to import this parameter, just resave it after importing and specify the officially given parameter.

After this problem is solved, we will talk about the pre-training model. After importing the parameters, we need to modify the last layer of the network, because the resnet itself does 1000 classification, the number of neurons in the last layer is 1000, and we need to do biclassification here, so we need to change it to 2.

In addition, there are the three ways of transfer learning: the

  • Retrain all parameters after loading the weights - Good Hardware
  • After loading the weights, only the last few layers are trained and the previous layers are frozen, or the learning rate of the previous layers is reduced and the learning rate of the fully connected layers is increased, i.e., the learning rate is adjusted in groups.
  • After loading the full center, add another fully connected layer to the original network, and train only the last fully connected layer.

I'm using the all-training approach here, but it's necessary to organize here what to do when you want to train only the later layers, or when the front and back layers are trained at different learning rates:

# Create the model Note that the number of classes is not specified here, the default is 1000 classes.
net = resnet34()
model_weight_path = 'saved_model_weight/resnet34_pretrain_ori_low_torch_version.pth'

# Use pre-trained parameters, then finetune
net.load_state_dict((model_weight_path, map_location='cpu'))

# Change fc layer structure Change fc output dimension to 2
in_channel = .in_features
 = (in_channel, 2)
(device)

# Model training configuration
loss_function = ()
# When training, you can also freeze out the parameters of the convolutional layers, and you can specify different learning rates for the parameters of different layers.
res_params, conv_params, fc_params = [], [], []
# named_parameters() returns the names of each layer as well as the parameters, and is a dictionary
for name, param in net.named_parameters():
     The # layer series is the residual layer
     if ('layer' in name):
           res_params.append(param)
     # Full connectivity layer
     elif ('fc' in name):
           fc_params.append(param)
     else:
           param.requires_grad = False

params = [
     {'params': res_params, 'lr': 0.0001},
     {'params': fc_params, 'lr': 0.0002},
]

optimizer = (params)

Just change the parameters of the optimizer here.

After this is done, call the function for model training and just train it directly. This script is just a routine operation, so I won't post the code here.

4.2 Model predictions

With the saved model, we take a frame, divide it into parking spaces according to the parking space dictionary, and then use the model to predict whether it is empty or not, and if it is empty, mark it on the original image.

So here are the core projections for the entire program:

def predict_on_img(img, spot_dict, model, class_indict, make_copy=True, color=[0, 255, 0], alpha=0.5, save=True):
    # This is a panoramic view of the parking lot #
    if make_copy:
        new_image = (img)
        overlay = (img)

    cnt_empty, all_spots = 0, 0
    for spot in tqdm(spot_dict.keys()):
        all_spots += 1
        (x1, y1, x2, y2) = spot
        (x1, y1, x2, y2) = (int(x1), int(y1), int(x2), int(y2))
        spot_img = img[y1:y2, x1:x2]
        spot_img_pil = (spot_img)

        label = model_infer(spot_img_pil, model, class_indict)
        if label == 'empty':
            (overlay, (int(x1), int(y1)), (int(x2), int(y2)), color, -1)
            cnt_empty += 1

    (overlay, alpha, new_image, 1 - alpha, 0, new_image)

    # Show results of
    (new_image, "Available: %d spots" % cnt_empty, (30, 95),
                cv2.FONT_HERSHEY_SIMPLEX,
                0.7, (255, 255, 255), 2)
    (new_image, "Total: %d spots" % all_spots, (30, 125),
                cv2.FONT_HERSHEY_SIMPLEX,
                0.7, (255, 255, 255), 2)

    if save:
        filename = 'with_marking_predict.jpg'
        (filename, new_image)
    # cv_imshow('new_image', new_image)
    return new_image

The core of the model prediction, is the model_infer function, this is also the regular operation of the model prediction, here but not too much explanation.

In the case of video, it is nothing more than a multi-frame image, and for each frame over this function, real-time prediction of the video can be done:

def predict_on_video(video_path, spot_dict, model, class_indict, ret=True):
    cap = (video_path)
    count = 0
    while ret:
        ret, image = ()
        count += 1

        if count == 5:
            count = 0
            new_image = predict_on_img(image, spot_dict, model, class_indict, save=False)

            ('frame', new_image)
            if (10) & 0xFF == ord('q'):
                break

    ()
    ()

This is the whole project.

5. Summary

Finally saw a small sparrow project, although it may be a little simple, but it can be the image processing plus model training prediction, this set of mechanisms are utilized, for beginners like me is quite friendly. Through this project, in terms of image preprocessing to learn the binarization of the InRange, Hoff straight line detection, fixed-point calibration techniques, mask matrix for region locking, as well as through the coordinates of the region extraction and so on. In terms of modeling, I learned about resnet and reviewed pytorch migration learning. I also got to know a few new libraries such as glob, shutil, PIL, etc. So, I gained a lot. So, I have gained a lot, and I feel that cv is getting more and more interesting haha.

This project feels quite meaningful in the actual scene, open your mind to fantasize about the future if the popularity of intelligent transportation, in the operation of the smart parking lot, through the camera real-time detection of parking spaces in the parking lot of the vacant status and marked the location of the unmanned car system, and then the unmanned car according to the information to automatically plan the parking route, lock the parking space directly and automatically parked the car. Avoid the parking lot congestion (maybe now we park a few turns to find a parking space, there may be blocked in the bad out). And parking lot availability can be seen at a glance through the big screen, saving the user to find a parking space, parking time.

Well, it's just an early brainstorm, the future will give us the answer as to whether or not it works 😉

Code address for this project/zhongqiangwu960812/OpenCVLearning

To this article on OpenCV parking real-time detection project practice is introduced to this article, more related to OpenCV parking real-time detection content, please search for my previous articles or continue to browse the following related articles I hope you will support me in the future!