In this tutorial, you will learn how to use the dlib library to effectively track multiple objects in live video.
We can certainly track multiple objects using dlib; however, for the best possible performance, we need to utilize multiprocessing and distribute the object tracker across multiple cores of the processor.
Correctly utilizing multiprocessing allowed us to improve dlib multi-object tracking frames per second (FPS) by more than 45%!
1. Multi-target tracking using dlib
In the first part of this guide, I will demonstrate how to implement a simple, plain dlib multi-object tracking script. The program will track multiple objects in the video; however, we will notice that the script runs a bit slow. In order to improve our FPS, I will show you a faster and more efficient implementation of the dlib multi-object tracker. Finally, I will discuss some improvements and suggestions to enhance our multi-object tracking implementation.
2. Project structure
You can view the structure of our project using the tree command:
The mobilenet_ssd/ directory contains our MobileNet + SSD Caffe model files, which allow us to detect people (and other objects). Today we will review two Python scripts:
- multi_object_tracking_slow.py: dlib's simple "plain" approach to multi-object tracking.
- multi_object_tracking_fast.py: an advanced, fast method for utilizing multiprocessing.
A Simple "Plain" Approach to Multi-Object Tracing
The first dlib multi-object tracing implementation we're going to introduce today is "plain" in the sense that it will:
1. Use a simple list of tracker objects.
2. Update each tracker sequentially using only a single core of our processor.
For some object tracking tasks, this implementation will be more than sufficient; however, to optimize our FPS, we should distribute the object tracker across multiple processes.
We'll start with the simple implementation in this section and then move to a faster approach in the next section. First, open the multi_object_tracking_slow.py script and insert the following code:
# import the necessary packages from import FPS import numpy as np import argparse import imutils import dlib import cv2
Let's parse our command line arguments:
# construct the argument parser and parse the arguments ap = () ap.add_argument("-p", "--prototxt", required=True, help="path to Caffe 'deploy' prototxt file") ap.add_argument("-m", "--model", required=True, help="path to Caffe pre-trained model") ap.add_argument("-v", "--video", required=True, help="path to input video file") ap.add_argument("-o", "--output", type=str, help="path to optional output video file") ap.add_argument("-c", "--confidence", type=float, default=0.2, help="minimum probability to filter weak detections") args = vars(ap.parse_args())
Our script handles the following command line arguments at runtime:
- --prototxt : The path where Caffe deploys the prototxt file.
- ---model : The path to the model file attached to prototxt.
- --video : Path to the input video file. We will use dlib to perform multi-object tracing in this video.
- --output : Optional path for outputting video files. If no path is specified, the video will not be output to disk. I recommend outputting to .avi or .mp4 files.
- --confidence : Object Detection Confidence Threshold , default is 0.2, this value indicates the minimum probability of filtering weak detections from the object detector.
Let's define the list of classes supported by this model and load our model from disk:
# initialize the list of class labels MobileNet SSD was trained to # detect CLASSES = ["background", "aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"] # load our serialized model from disk print("[INFO] loading model...") net = (args["prototxt"], args["model"])
We are only concerned with the Human class in today's running example, but you can easily modify it to track other classes. We loaded a pre-trained object detector model. We will use our pre-trained SSD to detect the presence of objects in the video. We will create a dlib object tracker to track each detected object.
We still have some initialization to perform:
# initialize the video stream and output video writer print("[INFO] starting video stream...") vs = (args["video"]) writer = None # initialize the list of object trackers and corresponding class # labels trackers = [] labels = [] # start the frames per second throughput estimator fps = FPS().start()
We initialize our video stream - we will read one frame at a time from the input video. Our video writer is then initialized to None. We'll work with the video writer more in the upcoming while loop. Now initialize our tracker and tag list. Finally, start our frames per second counter. We're all set to start processing the video:
# loop over frames from the video file stream while True: # grab the next frame from the video file (grabbed, frame) = () # check to see if we have reached the end of the video file if frame is None: break # resize the frame for faster processing and then convert the # frame from BGR to RGB ordering (dlib needs RGB ordering) frame = (frame, width=600) rgb = (frame, cv2.COLOR_BGR2RGB) # if we are supposed to be writing a video to disk, initialize # the writer if args["output"] is not None and writer is None: fourcc = cv2.VideoWriter_fourcc(*"MJPG") writer = (args["output"], fourcc, 30, ([1], [0]), True)
The frames were resized to 600 pixels wide, maintaining the aspect ratio. Then, for dlib compatibility, the frames were converted to RGB color channel sorting (OpenCV's default is BGR, while dlib's default is RGB).
Let's start the object detection phase:
# if there are no object trackers we first need to detect objects # and then create a tracker for each object if len(trackers) == 0: # grab the frame dimensions and convert the frame to a blob (h, w) = [:2] blob = (frame, 0.007843, (w, h), 127.5) # pass the blob through the network and obtain the detections # and predictions (blob) detections = ()
In order to perform object tracking, we must first perform object detection
- Manually, by stopping the video stream and manually selecting the bounding box for each object.
- Programmatically, use a trained object detector to detect the presence of an object (which is what we are doing here).
If there is no object tracker, then we know we haven't performed object detection.
We create and pass a blob over the SSD network to detect the object.
Next, we continue the loop detection to find objects belonging to the person class, since our input video is a human race:
# loop over the detections for i in (0, [2]): # extract the confidence (., probability) associated # with the prediction confidence = detections[0, 0, i, 2] # filter out weak detections by requiring a minimum # confidence if confidence > args["confidence"]: # extract the index of the class label from the # detections list idx = int(detections[0, 0, i, 1]) label = CLASSES[idx] # if the class label is not a person, ignore it if CLASSES[idx] != "person": continue
We start the cycle of detection in which we:
- Filter out weak detections.
- Ensure that each detection is a person. of course. you can delete this line of code or customize it to your own filtering needs.
Now that we have each person positioned in the frame, let's instantiate our tracker and draw our initial bounding box + class label:
# compute the (x, y)-coordinates of the bounding box # for the object box = detections[0, 0, i, 3:7] * ([w, h, w, h]) (startX, startY, endX, endY) = ("int") # construct a dlib rectangle object from the bounding # box coordinates and start the correlation tracker t = dlib.correlation_tracker() rect = (startX, startY, endX, endY) t.start_track(rgb, rect) # update our set of trackers and corresponding class # labels (label) (t) # grab the corresponding class label for the detection # and draw the bounding box (frame, (startX, startY), (endX, endY), (0, 255, 0), 2) (frame, label, (startX, startY - 15), cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 255, 0), 2)
To start tracking objects, we:
- Calculate the bounding box for each detected object.
- Instantiate the bounding box coordinates and pass them to the tracker. The bounding box is especially important here. We need to create one for the bounding box and pass it to the start_track method. Then, dlib can start tracking the object.
- Finally, we populate the trackers list with a single tracker.
Therefore, in the next code block, we will deal with the case where the tracker has already been established and only needs to be updated with the location. We perform two additional tasks in the initial detection step:
- Attach class tags to the tag list. If you are tracking multiple types of objects (e.g. dog+person), you may want to know the type of each object.
- Draw each bounding box rectangle and class label around the object.
We know we're in the target tracking phase if the length of our detection list is greater than 0:.
# otherwise, we've already performed detection so let's track # multiple objects else: # loop over each of the trackers for (t, l) in zip(trackers, labels): # update the tracker and grab the position of the tracked # object (rgb) pos = t.get_position() # unpack the position object startX = int(()) startY = int(()) endX = int(()) endY = int(()) # draw the bounding box from the correlation object tracker (frame, (startX, startY), (endX, endY), (0, 255, 0), 2) (frame, l, (startX, startY - 15), cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 255, 0), 2)
In the target tracking phase, we iterate through all the trackers and the corresponding labels. then we proceed to UPDATE the position of each object. To update the position, we simply pass the rgb image.
After extracting the bounding box coordinates, we can draw a bounding box rectangle and label for each tracked object.
The remaining steps in the frame processing loop involve writing the output video (if necessary) and displaying the results:
# check to see if we should write the frame to disk if writer is not None: (frame) # show the output frame ("Frame", frame) key = (1) & 0xFF # if the `q` key was pressed, break from the loop if key == ord("q"): break # update the FPS counter ()
Here we are:
- If necessary, write the frame to the video.
- Displays the output frame and captures the keystrokes. If the q key is pressed (exit), we jump out of the loop. Finally, we update our frames per second information for benchmarking.
The remaining steps are to print the FPS information at the terminal and release the pointer:
# stop the timer and display FPS information () print("[INFO] elapsed time: {:.2f}".format(())) print("[INFO] approx. FPS: {:.2f}".format(())) # check to see if we need to release the video writer pointer if writer is not None: () # do a bit of cleanup () ()
Let's evaluate accuracy and performance. Open a terminal and execute the following command:
$ python multi_object_tracking_slow.py --prototxt mobilenet_ssd/MobileNetSSD_deploy.prototxt \ --model mobilenet_ssd/MobileNetSSD_deploy.caffemodel \ --video race.mp4 --output race_output_slow.avi [INFO] loading model... [INFO] starting video stream... [INFO] elapsed time: 24.51 [INFO] approx. FPS: 13.87
Looks like our multi-target tracker worked!
But as you can see, we're only getting about 13 fps.
For some applications, this FPS may be sufficient - however, if you need faster FPS, I suggest you take a look at our more efficient dlib multi-object tracker below. Secondly, understand that tracking accuracy is not perfect.
4. Fast and efficient dlib multi-object tracking implementation
If you run the dlib multi-object tracing script from the previous section and turn on the system's monitor at the same time, you'll notice that only one core of the processor is being used.
If you run the dlib multi-object tracing script from the previous section and turn on the system's activity monitor at the same time, you'll notice that only one core of the processor is being used.
Utilizing processes allows our operating systems to perform better process scheduling, mapping processes to specific processor cores on our machines (most modern operating systems are able to efficiently schedule processes that use a large number of CPUs in a parallel fashion).
Go ahead and open mutli_object_tracking_fast.py and insert the following code:
# import the necessary packages from import FPS import multiprocessing import numpy as np import argparse import imutils import dlib import cv2
We'll use the Python Process class to spawn a new process - each new process is independent of the original.
To spawn the process, we need to provide a function that Python can call, which Python will then use and create a brand new process and execute it:
def start_tracker(box, label, rgb, inputQueue, outputQueue): # construct a dlib rectangle object from the bounding box # coordinates and then start the correlation tracker t = dlib.correlation_tracker() rect = (box[0], box[1], box[2], box[3]) t.start_track(rgb, rect)
The first three arguments to start_tracker include:
- box : the coordinates of the bounding box of the object we want to track, probably returned by some kind of object detector, either manual or programmed.
- label : The human-readable label of the object.
- rgb: the RGB image we will use to start the initial dlib object tracker.
Remember how Python multiprocessing works - Python will call this function and then create a brand new interpreter to execute the code in it. As a result, each spawned start_tracker process will be independent of its parent. In order to communicate with the Python driver script, we need to utilize Pipes and Queues. Both types of objects are thread/process safe and use locks and semaphores to accomplish this.
Essentially, we are creating a simple producer/consumer relationship:
- Our parent process will generate new frames and add them to the queue for the specific object tracker.
- The child process will then consume the frame, apply object tracking, and return the updated bounding box coordinates.
I decided to use the Queue object in this post; however, keep in mind that you can also use the Pipe if you wish!
Now let's start an infinite loop which will run in process:
# loop indefinitely -- this function will be called as a daemon # process so we don't need to worry about joining it while True: # attempt to grab the next frame from the input queue rgb = () # if there was an entry in our queue, process it if rgb is not None: # update the tracker and grab the position of the tracked # object (rgb) pos = t.get_position() # unpack the position object startX = int(()) startY = int(()) endX = int(()) endY = int(()) # add the label + bounding box coordinates to the output # queue ((label, (startX, startY, endX, endY)))
We're in an infinite loop here - this function will be called as a daemon, so we don't need to worry about joining it.
First, we will try to grab a new frame from the inputQueue. If the frame is not empty, we will grab the frame and then update the object tracker to give us the updated bounding box coordinates.
Finally, we write the labels and bounding boxes to the outputQueue so that the parent process can use them in the main loop of the script.
Going back to the parent process, we will parse the command line arguments:
# construct the argument parser and parse the arguments ap = () ap.add_argument("-p", "--prototxt", required=True, help="path to Caffe 'deploy' prototxt file") ap.add_argument("-m", "--model", required=True, help="path to Caffe pre-trained model") ap.add_argument("-v", "--video", required=True, help="path to input video file") ap.add_argument("-o", "--output", type=str, help="path to optional output video file") ap.add_argument("-c", "--confidence", type=float, default=0.2, help="minimum probability to filter weak detections") args = vars(ap.parse_args())
The command line parameters for this script are identical to our slower, non-multiprocessing script.
Let's initialize our input and output queues:
# initialize our lists of queues -- both input queue and output queue # for *every* object that we will be tracking inputQueues = [] outputQueues = []
These Queues will hold the objects we are tracking. Two Queue objects are needed for each process that is spawned:
- An input frame from which to read
- The other writes the result to the
The next block of code is the same as our previous script:
# initialize the list of class labels MobileNet SSD was trained to # detect CLASSES = ["background", "aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"] # load our serialized model from disk print("[INFO] loading model...") net = (args["prototxt"], args["model"]) # initialize the video stream and output video writer print("[INFO] starting video stream...") vs = (args["video"]) writer = None # start the frames per second throughput estimator fps = FPS().start()
We define the CLASSES of the model and load the model itself.
Now let's start looping through the frames in the video stream:
# loop over frames from the video file stream while True: # grab the next frame from the video file (grabbed, frame) = () # check to see if we have reached the end of the video file if frame is None: break # resize the frame for faster processing and then convert the # frame from BGR to RGB ordering (dlib needs RGB ordering) frame = (frame, width=600) rgb = (frame, cv2.COLOR_BGR2RGB) # if we are supposed to be writing a video to disk, initialize # the writer if args["output"] is not None and writer is None: fourcc = cv2.VideoWriter_fourcc(*"MJPG") writer = (args["output"], fourcc, 30, ([1], [0]), True)
Now let's handle the case where there are no inputQueues:
# if our list of queues is empty then we know we have yet to # create our first object tracker if len(inputQueues) == 0: # grab the frame dimensions and convert the frame to a blob (h, w) = [:2] blob = (frame, 0.007843, (w, h), 127.5) # pass the blob through the network and obtain the detections # and predictions (blob) detections = () # loop over the detections for i in (0, [2]): # extract the confidence (., probability) associated # with the prediction confidence = detections[0, 0, i, 2] # filter out weak detections by requiring a minimum # confidence if confidence > args["confidence"]: # extract the index of the class label from the # detections list idx = int(detections[0, 0, i, 1]) label = CLASSES[idx] # if the class label is not a person, ignore it if CLASSES[idx] != "person": continue
If there are no inputQueues, then we need to apply object detection before object tracking. We apply object detection and then continue the loop. We get the confidence value and filter out weak detections. If our confidence meets the threshold established by our command line arguments, we consider the detection, but we filter it out further by the class label. In this case, we are only looking for person objects. Assuming we find a person, we will create the queue and spawn the trace process:
# compute the (x, y)-coordinates of the bounding box # for the object box = detections[0, 0, i, 3:7] * ([w, h, w, h]) (startX, startY, endX, endY) = ("int") bb = (startX, startY, endX, endY) # create two brand new input and output queues, # respectively iq = () oq = () (iq) (oq) # spawn a daemon process for a new object tracker p = ( target=start_tracker, args=(bb, label, rgb, iq, oq)) = True () # grab the corresponding class label for the detection # and draw the bounding box (frame, (startX, startY), (endX, endY), (0, 255, 0), 2) (frame, label, (startX, startY - 15), cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 255, 0), 2)
We start by computing the bounding box coordinates. From there we create two new queues, iq and oq, and attach them to inputQueues and outputQueues, respectively. we spawn a new start_tracker process, passing in the bounding box, labels, rgb image, and iq + oq.
We also draw the bounding box rectangle of the detected object and the class label label.
Otherwise, we have already performed object detection, so we need to apply each dlib object tracker to the frame:
# otherwise, we've already performed detection so let's track # multiple objects else: # loop over each of our input ques and add the input RGB # frame to it, enabling us to update each of the respective # object trackers running in separate processes for iq in inputQueues: (rgb) # loop over each of the output queues for oq in outputQueues: # grab the updated bounding box coordinates for the # object -- the .get method is a blocking operation so # this will pause our execution until the respective # process finishes the tracking update (label, (startX, startY, endX, endY)) = () # draw the bounding box from the correlation object # tracker (frame, (startX, startY), (endX, endY), (0, 255, 0), 2) (frame, label, (startX, startY - 15), cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 255, 0), 2)
Iterating over each inputQueues, we add rgb images to them. Then we iterate over each outputQueues and get the bounding box coordinates from each individual object tracker. Finally, we draw the bounding box + the associated class label label.
# check to see if we should write the frame to disk if writer is not None: (frame) # show the output frame ("Frame", frame) key = (1) & 0xFF # if the `q` key was pressed, break from the loop if key == ord("q"): break # update the FPS counter () # stop the timer and display FPS information () print("[INFO] elapsed time: {:.2f}".format(())) print("[INFO] approx. FPS: {:.2f}".format(())) # check to see if we need to release the video writer pointer if writer is not None: () # do a bit of cleanup () ()
If necessary, we write frames to the output video and display the frames to the screen. If we press q, we exit and jump out of the loop. If we continue processing the frame, our FPS calculator is updated and we start processing again at the beginning of the while loop. Otherwise, we finish processing the frames, we display the FPS information + release the pointer and close the window.
Open a terminal and execute the following command:
$ python multi_object_tracking_fast.py --prototxt mobilenet_ssd/MobileNetSSD_deploy.prototxt \ --model mobilenet_ssd/MobileNetSSD_deploy.caffemodel \ --video race.mp4 --output race_output_fast.avi [INFO] loading model... [INFO] starting video stream... [INFO] elapsed time: 14.01 [INFO] approx. FPS: 24.26
As you can see, our faster, more efficient multi-object tracker runs at 24 FPS, an improvement of over 45% over our previous implementation? In addition, if you turn on the Activity Monitor while this script is running, you'll see more of your system's CPU being used. This speedup is obtained by allowing each dlib object tracker to run in a separate process, which in turn allows your operating system to perform more efficient scheduling of CPU resources.
5. Complete Code
multi_object_tracking_slow.py
# USAGE # python multi_object_tracking_slow.py --prototxt mobilenet_ssd/MobileNetSSD_deploy.prototxt \ # --model mobilenet_ssd/MobileNetSSD_deploy.caffemodel --video race.mp4 # import the necessary packages from import FPS import numpy as np import argparse import imutils import dlib import cv2 # construct the argument parser and parse the arguments ap = () ap.add_argument("-p", "--prototxt", required=True, help="path to Caffe 'deploy' prototxt file") ap.add_argument("-m", "--model", required=True, help="path to Caffe pre-trained model") # ap.add_argument("-v", "--video", required=True, # help="path to input video file") ap.add_argument("-v", "--video", help="path to input video file") ap.add_argument("-o", "--output", type=str, help="path to optional output video file") ap.add_argument("-c", "--confidence", type=float, default=0.2, help="minimum probability to filter weak detections") args = vars(ap.parse_args()) # initialize the list of class labels MobileNet SSD was trained to # detect CLASSES = ["background", "aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"] # load our serialized model from disk print("[INFO] loading model...") net = (args["prototxt"], args["model"]) # initialize the video stream and output video writer print("[INFO] starting video stream...") # vs = (args["video"]) vs = (0) writer = None # initialize the list of object trackers and corresponding class # labels trackers = [] labels = [] # start the frames per second throughput estimator fps = FPS().start() # loop over frames from the video file stream while True: # grab the next frame from the video file (grabbed, frame) = () # check to see if we have reached the end of the video file if frame is None: break # resize the frame for faster processing and then convert the # frame from BGR to RGB ordering (dlib needs RGB ordering) frame = (frame, width=600) rgb = (frame, cv2.COLOR_BGR2RGB) # if we are supposed to be writing a video to disk, initialize # the writer if args["output"] is not None and writer is None: fourcc = cv2.VideoWriter_fourcc(*"MJPG") writer = (args["output"], fourcc, 30, ([1], [0]), True) # if there are no object trackers we first need to detect objects # and then create a tracker for each object if len(trackers) == 0: # grab the frame dimensions and convert the frame to a blob (h, w) = [:2] blob = (frame, 0.007843, (w, h), 127.5) # pass the blob through the network and obtain the detections # and predictions (blob) detections = () # loop over the detections for i in (0, [2]): # extract the confidence (., probability) associated # with the prediction confidence = detections[0, 0, i, 2] # filter out weak detections by requiring a minimum # confidence if confidence > args["confidence"]: # extract the index of the class label from the # detections list idx = int(detections[0, 0, i, 1]) label = CLASSES[idx] # if the class label is not a person, ignore it if CLASSES[idx] != "person": continue # compute the (x, y)-coordinates of the bounding box # for the object box = detections[0, 0, i, 3:7] * ([w, h, w, h]) (startX, startY, endX, endY) = ("int") # construct a dlib rectangle object from the bounding # box coordinates and start the correlation tracker t = dlib.correlation_tracker() rect = (startX, startY, endX, endY) t.start_track(rgb, rect) # update our set of trackers and corresponding class # labels (label) (t) # grab the corresponding class label for the detection # and draw the bounding box (frame, (startX, startY), (endX, endY), (0, 255, 0), 2) (frame, label, (startX, startY - 15), cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 255, 0), 2) # otherwise, we've already performed detection so let's track # multiple objects else: # loop over each of the trackers for (t, l) in zip(trackers, labels): # update the tracker and grab the position of the tracked # object (rgb) pos = t.get_position() # unpack the position object startX = int(()) startY = int(()) endX = int(()) endY = int(()) # draw the bounding box from the correlation object tracker (frame, (startX, startY), (endX, endY), (0, 255, 0), 2) (frame, l, (startX, startY - 15), cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 255, 0), 2) # check to see if we should write the frame to disk if writer is not None: (frame) # show the output frame ("Frame", frame) key = (1) & 0xFF # if the `q` key was pressed, break from the loop if key == ord("q"): break # update the FPS counter () # stop the timer and display FPS information () print("[INFO] elapsed time: {:.2f}".format(())) print("[INFO] approx. FPS: {:.2f}".format(())) # check to see if we need to release the video writer pointer if writer is not None: () # do a bit of cleanup () ()
multi_object_tracking_fast.py
# USAGE # python multi_object_tracking_fast.py --prototxt mobilenet_ssd/MobileNetSSD_deploy.prototxt \ # --model mobilenet_ssd/MobileNetSSD_deploy.caffemodel --video race.mp4 # import the necessary packages from import FPS import multiprocessing import numpy as np import argparse import imutils import dlib import cv2 def start_tracker(box, label, rgb, inputQueue, outputQueue): # construct a dlib rectangle object from the bounding box # coordinates and then start the correlation tracker t = dlib.correlation_tracker() rect = (box[0], box[1], box[2], box[3]) t.start_track(rgb, rect) # loop indefinitely -- this function will be called as a daemon # process so we don't need to worry about joining it while True: # attempt to grab the next frame from the input queue rgb = () # if there was an entry in our queue, process it if rgb is not None: # update the tracker and grab the position of the tracked # object (rgb) pos = t.get_position() # unpack the position object startX = int(()) startY = int(()) endX = int(()) endY = int(()) # add the label + bounding box coordinates to the output # queue ((label, (startX, startY, endX, endY))) # construct the argument parser and parse the arguments ap = () ap.add_argument("-p", "--prototxt", required=True, help="path to Caffe 'deploy' prototxt file") ap.add_argument("-m", "--model", required=True, help="path to Caffe pre-trained model") ap.add_argument("-v", "--video", required=True, help="path to input video file") ap.add_argument("-o", "--output", type=str, help="path to optional output video file") ap.add_argument("-c", "--confidence", type=float, default=0.2, help="minimum probability to filter weak detections") args = vars(ap.parse_args()) # initialize our list of queues -- both input queue and output queue # for *every* object that we will be tracking inputQueues = [] outputQueues = [] # initialize the list of class labels MobileNet SSD was trained to # detect CLASSES = ["background", "aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"] # load our serialized model from disk print("[INFO] loading model...") net = (args["prototxt"], args["model"]) # initialize the video stream and output video writer print("[INFO] starting video stream...") vs = (args["video"]) writer = None # start the frames per second throughput estimator fps = FPS().start() # loop over frames from the video file stream while True: # grab the next frame from the video file (grabbed, frame) = () # check to see if we have reached the end of the video file if frame is None: break # resize the frame for faster processing and then convert the # frame from BGR to RGB ordering (dlib needs RGB ordering) frame = (frame, width=600) rgb = (frame, cv2.COLOR_BGR2RGB) # if we are supposed to be writing a video to disk, initialize # the writer if args["output"] is not None and writer is None: fourcc = cv2.VideoWriter_fourcc(*"MJPG") writer = (args["output"], fourcc, 30, ([1], [0]), True) # if our list of queues is empty then we know we have yet to # create our first object tracker if len(inputQueues) == 0: # grab the frame dimensions and convert the frame to a blob (h, w) = [:2] blob = (frame, 0.007843, (w, h), 127.5) # pass the blob through the network and obtain the detections # and predictions (blob) detections = () # loop over the detections for i in (0, [2]): # extract the confidence (., probability) associated # with the prediction confidence = detections[0, 0, i, 2] # filter out weak detections by requiring a minimum # confidence if confidence > args["confidence"]: # extract the index of the class label from the # detections list idx = int(detections[0, 0, i, 1]) label = CLASSES[idx] # if the class label is not a person, ignore it if CLASSES[idx] != "person": continue # compute the (x, y)-coordinates of the bounding box # for the object box = detections[0, 0, i, 3:7] * ([w, h, w, h]) (startX, startY, endX, endY) = ("int") bb = (startX, startY, endX, endY) # create two brand new input and output queues, # respectively iq = () oq = () (iq) (oq) # spawn a daemon process for a new object tracker p = ( target=start_tracker, args=(bb, label, rgb, iq, oq)) = True () # grab the corresponding class label for the detection # and draw the bounding box (frame, (startX, startY), (endX, endY), (0, 255, 0), 2) (frame, label, (startX, startY - 15), cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 255, 0), 2) # otherwise, we've already performed detection so let's track # multiple objects else: # loop over each of our input ques and add the input RGB # frame to it, enabling us to update each of the respective # object trackers running in separate processes for iq in inputQueues: (rgb) # loop over each of the output queues for oq in outputQueues: # grab the updated bounding box coordinates for the # object -- the .get method is a blocking operation so # this will pause our execution until the respective # process finishes the tracking update (label, (startX, startY, endX, endY)) = () # draw the bounding box from the correlation object # tracker (frame, (startX, startY), (endX, endY), (0, 255, 0), 2) (frame, label, (startX, startY - 15), cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 255, 0), 2) # check to see if we should write the frame to disk if writer is not None: (frame) # show the output frame ("Frame", frame) key = (1) & 0xFF # if the `q` key was pressed, break from the loop if key == ord("q"): break # update the FPS counter () # stop the timer and display FPS information () print("[INFO] elapsed time: {:.2f}".format(())) print("[INFO] approx. FPS: {:.2f}".format(())) # check to see if we need to release the video writer pointer if writer is not None: () # do a bit of cleanup () ()
Link:/s/1WhJr-Qxh5Wu3TsXKRiTHRg Extraction code: 1234
6. Improvements and recommendations
The dlib multi-object tracking Python script I'm sharing with you today works well with shorter video streams; however, if you intend to use this implementation for long-running production environments (around hours to days of video), there are two main improvements I recommend you make:
The first improvement was to utilize a process pool instead of spawning an entirely new process for each object to be tracked. The implementation presented here today builds a brand new queue Queue and process Process for each object we need to track.
That's fine for today's purposes, but consider if you wanted to track 50 objects in your video - that means you would spawn 50 processes, one for each object. At that point, the overhead of the system managing all of these processes would destroy any increase in FPS. Instead, you may want to utilize process pooling.
If your system has N processor cores, then you need to create a pool of N - 1 processes, leaving one core to your operating system to perform system operations. Each of these processes should perform multiple object traces, maintaining a list of object tracers, similar to the first multiple object trace we covered today.
This improvement will allow you to utilize all the cores of the processor without incurring the overhead of many separate processes.
The second improvement I want to make is to clean up processes and queues. If dlib reports an object as "lost" or "disappeared", we don't return from the start_tracker function, which means that the process will live on for the lifetime of the parent script and will only be terminated when the parent script exits.
Again, this is fine for our purposes today, but if you intend to use this code in a production environment, you should:
- Update the start_tracker function to return after the dlib report object is lost.
- Delete the inputQueue and outputQueue of the corresponding process at the same time.
Failure to perform this cleanup will result in unnecessary computational consumption and memory overhead for long-running jobs.
The third improvement is to increase tracking accuracy by running the object detector every N frames (instead of just once at the beginning).
I actually demonstrated this in my post on counting with OpenCV. It requires more logic and thought, but produces a more accurate tracker. I've chosen to abandon the implementation of this script so that I can teach you the multiprocessing method in a concise manner. Ideally, there is a third improvement you can use in addition to multiprocessing.
The above is Python OpenCV using dlib for multi-target tracking details, more information about OpenCV dlib multi-target tracking please pay attention to my other related articles!