Detailed explanation of the use of CUDA + OpenCV to accelerate yolo v4 performance

YOLO stands for You-Only-Look-Once, and it is undoubtedly one of the best object detectors trained on the COCO dataset.YOLOv4 is the latest iteration, and the trade-off between accuracy and performance makes it one of the most advanced object detectors available. Typical mechanisms for using any object detector in an intelligent video analytics pipeline include using libraries like Tensorflow or PyTorch that can operate on NVIDIA GPUs to accelerate model inference.

OpenCV is used for image/video streaming input, pre-processing and post-processing of visuals. What if I told you that OpenCV is now able to take advantage of NVIDIA CUDA to run YOLOv4 locally using DNN modules? This article will take you through building OpenCV with CUDA and cuDNN to accelerate YOLOv4 inference using DNN modules.

present (sb for a job etc)

Most of the hobbyists I know have GPU-enabled devices. My goal is to make GPU acceleration mainstream. Who doesn't like projects to run faster? I've used OpenCV 4.5.1, CUDA 11.2 and cuDNN 8.1.0 to get started and make inference easier!

First, you need to set up CUDA, then install cuDNN, and finally end with building OpenCV. Also, this blog is broken up into sections so it's easier to understand!

CUDA 11.2 and cuDNN 8.1.0 Installation

The part most likely to keep your computer from booting. Just kidding! Getting everything right should be a breeze.

Install CUDA 11.2

First download the deb file from the CUDA repository depending on your platform.

CUDA repository:/cuda-downloads

After selecting the platform correctly, you will be provided with installation commands. If your platform is similar to mine, you can install it as follows:

wget /compute/cuda/repos/ubuntu2004/x86_64/ mv  /etc/apt//cuda-repository-pin-600wget /compute/cuda/11.2.1/local_installers/cuda-repo-ubuntu2004-11-2-local_11.2.1-460.32.03-1_amd64.debsudo dpkg -i cuda-repo-ubuntu2004-11-2-local_11.2.1-460.32.03-1_amd64.debsudo apt-key add /var/cuda-repo-ubuntu2004-11-2-local/ apt updatesudo apt -y install cudasudo reboot

If done correctly, you should get the following output when running nvidia-smi

Finally, paste the following into .bashrc or .zshrc

# CUDA
export CUDA=11.2
export PATH=/usr/local/cuda-$CUDA/bin${PATH:+:${PATH}}
export CUDA_PATH=/usr/local/cuda-$CUDA
export CUDA_HOME=/usr/local/cuda-$CUDA
export LIBRARY_PATH=$CUDA_HOME/lib64:$LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/local/cuda-$CUDA/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH
export NVCC=/usr/local/cuda-$CUDA/bin/nvcc
export CFLAGS="-I$CUDA_HOME/include $CFLAGS"CUDA

Don't forget to addsource ~/.bashrcmaybesource ~/.zshrc

Installing cuDNN 8.1.0

For this, you need to have an NVIDIA account, so be sure to register first. Once that's done, head over to the link below and download the marked file.

/rdp/cudnn-download

After downloading the deb file, run the following command-

sudo dpkg -i libcudnn8_8.1.0.77-1+cuda11.2_amd64.deb
sudo dpkg -i libcudnn8-dev_8.1.0.77-1+cuda11.2_amd64.deb

This marks the completion of the NVIDIA CUDA and cuDNN installation!

Building OpenCV 4.5.1 from source code

The funny thing is that this excites me! This section will help you build OpenCV from source code using CUDA, GStreamer and FFMPEG ! There's a long list of commands to execute, so let's get started.

First, install the python developer package

sudo apt install python3-dev python3-pip python3-testresources

Next, let's install the dependencies needed to build OpenCV

sudo apt install build-essential cmake pkg-config unzip yasm git checkinstall
sudo apt install libjpeg-dev libpng-dev libtiff-dev
sudo apt install libavcodec-dev libavformat-dev libswscale-dev libavresample-dev
sudo apt install libgstreamer1.0-dev libgstreamer-plugins-base1.0-dev
sudo apt install libxvidcore-dev x264 libx264-dev libfaac-dev libmp3lame-dev libtheora-dev
sudo apt install libfaac-dev libmp3lame-dev libvorbis-dev
sudo apt install libopencore-amrnb-dev libopencore-amrwb-dev
sudo apt-get install libgtk-3-dev
sudo apt-get install libtbb-dev
sudo apt-get install libatlas-base-dev gfortran
sudo apt-get install libprotobuf-dev protobuf-compiler
sudo apt-get install libgoogle-glog-dev libgflags-dev
sudo apt-get install libgphoto2-dev libeigen3-dev libhdf5-dev doxygen

Numpy is a key python package for this build. Install it using pip

pip3 install numpy

Now you should have everything ready for the build. Run the following command to download and unzip the source code

mkdir opencvbuild && cd opencvbuild
wget -O  /opencv/opencv/archive/4.5.
wget -O opencv_contrib.zip /opencv/opencv_contrib/archive/4.5.
unzip 
unzip opencv_contrib.zip
mv opencv-4.5.1 opencv
mv opencv_contrib-4.5.1 opencv_contrib

Let's get ready to build!

cd opencv
mkdir build && cd build

assureCUDA_ARCH_BINChange according to your GPU.

cmake \
-D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_C_COMPILER=/usr/bin/gcc-7 \
-D CMAKE_INSTALL_PREFIX=/usr/local -D INSTALL_PYTHON_EXAMPLES=ON \
-D INSTALL_C_EXAMPLES=ON -D WITH_TBB=ON -D WITH_CUDA=ON -D WITH_CUDNN=ON \
-D OPENCV_DNN_CUDA=ON -D CUDA_ARCH_BIN=7.5 -D BUILD_opencv_cudacodec=OFF \
-D ENABLE_FAST_MATH=1 -D CUDA_FAST_MATH=1 -D WITH_CUBLAS=1 \
-D WITH_V4L=ON -D WITH_QT=OFF -D WITH_OPENGL=ON -D WITH_GSTREAMER=ON \
-D WITH_FFMPEG=ON -D OPENCV_GENERATE_PKGCONFIG=ON \
-D OPENCV_PC_FILE_NAME= -D OPENCV_ENABLE_NONFREE=ON \
-D OPENCV_EXTRA_MODULES_PATH=../../opencv_contrib/modules \
-D PYTHON_DEFAULT_EXECUTABLE=$(which python3) -D BUILD_EXAMPLES=ON ..

You should see successful builds like this one

Make sure CUDA is detected and the build path is accurate. If all is well, continue and execute the following command to start the build

make -j$(nproc)
sudo make install

To check if OpenCV was successfully built, run this command

pkg-config --libs --cflags opencv4

On successful installation, it should give you an output that looks something like this

Glad to see you made it this far! You should now be all set up to run the sample application.

Running the application

Go ahead and clone this repository and get the weights. Start by installing git-lfs

sudo apt install git git-lfs

Cloning repositories using model files

# Using HTTPS
git clone /aj-ames/
# Using SSH
git clone git@:aj-ames/
cd YOLOv4-OpenCV-CUDA-DNN/
git lfs install
git lfs pull

You can run the application on image, video camera, or RTSP input.

# Image
python3 dnn_infernece.py --image images/ --use_gpu
# Video
python3 dnn_inference.py --stream video.mp4 --use_gpu
 
# RTSP
python3 dnn_inference.py --stream rtsp://192.168.1.1:554/stream --use_gpu
 
# Webcam
python3 dnn_inference.py --stream webcam --use_gpu

PS: Delete--use-gpuflag to disable the GPU. counterproductive, isn't it?

Some benchmarks for geeks!

If the gains weren't huge, we wouldn't do it. Trust me, running on the GPU has increased my FPS by 10-15 times!

I tested two configurations

Intel Core i5 7300HQ + NVIDIA GeForce GTX 1050Ti

Intel Xeon E5-1650 v4 + NVIDIA Tesla T4

I'll let the numbers speak for themselves!

|     Device     |     FPS      |    Device      |     FPS      |
| :------------: | :----------: | :------------: | :----------: |
| Core i5 7300HQ |     2.1      |   GTX 1050 Ti  |     20.1     |
| Xeon E5-1650   |     3.5      |   Tesla T4     |     42.3     |

endnote

GPU acceleration is permeating multiple libraries and applications, enabling users to run heavier workloads at unprecedented speeds! Computer vision was once not a technology that was accessible to everyone, but with improvements in neural networks and increased hardware computing power, the gap has narrowed significantly. As AI evolves at a faster pace, so will our hardware evolve to be more flexible!

To this article on the use of CUDA OpenCV accelerate yolo v4 performance is introduced to this article, more related CUDA OpenCV accelerate yolo v4 content, please search for my previous articles or continue to browse the following related articles I hope you will support me more in the future!