preamble
The yolo algorithm, as a leader in the one-stage field, adopts an anchor-based approach for target detection, using anchors of different scales to directly regress the target box and output the position and category confidence of the target box at once.
Why use ANCHOR for detection?
The initial training process of the original YOLOv1 was very unstable. During the design of YOLOv2, the authors observed the ground truth of a large number of images and found that the target instances of the same category have similar gt aspect ratios: e.g., for cars, the gt are all short and fat rectangles; e.g., for pedestrians, the gt are all thin and tall rectangles. So the authors are inspired by this and pre-prepare a few bounding boxes from the dataset with larger odds ratios, and then use them as benchmarks for prediction.
The anchor detection process
First of all, the size of the input image for the coco dataset used in yolov5 is 640x640, but the input size for the training process is not unique, because v5 can use masaic enhancement to compose parts of 4 images into an input image of a certain size. But if you need to use pre-training weights, it is better to resize the input image to the same size as the author, and the input image size must be a multiple of 32, which is related to the following stage of ANCHOR detection.
Above is my own drawing of the network structure of v5 v6.0. When our input size is 640*640, we will get 3 different scales of output: 80x80 (640/8), 40x40 (640/16), 20x20 (640/32), which is the output of the CSP2_3 module in the above diagram.
anchors: - [10,13, 16,30, 33,23] # P3/8 - [30,61, 62,45, 59,119] # P4/16 - [116,90, 156,198, 373,326] # P5/32
Among them, 80x80 represents the shallow feature map (P3), which contains more low-level information and is suitable for detecting small targets, so the anchor scale used for this feature map is smaller; similarly, 20x20 represents the deep feature map (P5), which contains more high-level information such as contours, structures, etc., and is suitable for detecting large targets, so the anchor scale used for this feature map is larger. anchor scale is larger. The other 40x40 feature map (P4) uses an anchor between these two scales for detecting medium sized targets.This idea of using different scales of anchors for different feature maps is credited for yolov5's ability to detect cross-scale targets efficiently and quickly.
These are the specific explanations of anchors in yolov5.
The anchor generation process
For most of the images, the resize is done at the input stage because their size does not match our predefined input size, resulting in a change in the pre-labeled bounding box size as well. The anchor is calculated based on the size of the bounding box in our input network, so there is an anchor re-clustering process in this resize process. In theyolov5/utils/Under the file, there is a function kmeans_anchor, which calculates the anchor by the method of kmeans. as follows:
def kmean_anchors(dataset='./data/', n=9, img_size=640, thr=4.0, gen=1000, verbose=True): """ Creates kmeans-evolved anchors from training dataset Arguments: dataset: path to , or a loaded dataset n: number of anchors img_size: image size used for training thr: anchor-label wh ratio threshold hyperparameter hyp['anchor_t'] used for training, default=4.0 gen: generations to evolve anchors using genetic algorithm verbose: print all results Return: k: kmeans evolved anchors Usage: from import *; _ = kmean_anchors() """ from import kmeans thr = 1. / thr prefix = colorstr('autoanchor: ') def metric(k, wh): # compute metrics r = wh[:, None] / k[None] x = (r, 1. / r).min(2)[0] # ratio metric # x = wh_iou(wh, (k)) # iou metric return x, (1)[0] # x, best_x def anchor_fitness(k): # mutation fitness _, best = metric((k, dtype=torch.float32), wh) return (best * (best > thr).float()).mean() # fitness def print_results(k): k = k[((1))] # sort small to large x, best = metric(k, wh0) bpr, aat = (best > thr).float().mean(), (x > thr).float().mean() * n # best possible recall, anch > thr print(f'{prefix}thr={thr:.2f}: {bpr:.4f} best possible recall, {aat:.2f} anchors past thr') print(f'{prefix}n={n}, img_size={img_size}, metric_all={():.3f}/{():.3f}-mean/best, ' f'past_thr={x[x > thr].mean():.3f}-mean: ', end='') for i, x in enumerate(k): print('%i,%i' % (round(x[0]), round(x[1])), end=', ' if i < len(k) - 1 else '\n') # use in *.cfg return k if isinstance(dataset, str): # *.yaml file with open(dataset, errors='ignore') as f: data_dict = yaml.safe_load(f) # model dict from datasets import LoadImagesAndLabels dataset = LoadImagesAndLabels(data_dict['train'], augment=True, rect=True) # Get label wh shapes = img_size * / (1, keepdims=True) wh0 = ([l[:, 3:5] * s for s, l in zip(shapes, )]) # wh # Filter i = (wh0 < 3.0).any(1).sum() if i: print(f'{prefix}WARNING: Extremely small objects found. {i} of {len(wh0)} labels are < 3 pixels in size.') wh = wh0[(wh0 >= 2.0).any(1)] # filter > 2 pixels # wh = wh * (([0], 1) * 0.9 + 0.1) # multiply by random scale 0-1 # Kmeans calculation print(f'{prefix}Running kmeans for {n} anchors on {len(wh)} points...') s = (0) # sigmas for whitening k, dist = kmeans(wh / s, n, iter=30) # points, mean distance assert len(k) == n, f'{prefix}ERROR: requested {n} points but returned only {len(k)}' k *= s wh = (wh, dtype=torch.float32) # filtered wh0 = (wh0, dtype=torch.float32) # unfiltered k = print_results(k) # Plot # k, d = [None] * 20, [None] * 20 # for i in tqdm(range(1, 21)): # k[i-1], d[i-1] = kmeans(wh / s, i) # points, mean distance # fig, ax = (1, 2, figsize=(14, 7), tight_layout=True) # ax = () # ax[0].plot((1, 21), (d) ** 2, marker='.') # fig, ax = (1, 2, figsize=(14, 7)) # plot wh # ax[0].hist(wh[wh[:, 0]<100, 0],400) # ax[1].hist(wh[wh[:, 1]<100, 1],400) # ('', dpi=200) # Evolve npr = f, sh, mp, s = anchor_fitness(k), , 0.9, 0.1 # fitness, generations, mutation prob, sigma pbar = tqdm(range(gen), desc=f'{prefix}Evolving anchors with Genetic Algorithm:') # progress bar for _ in pbar: v = (sh) while (v == 1).all(): # mutate until a change occurs (prevent duplicates) v = (((sh) < mp) * () * (*sh) * s + 1).clip(0.3, 3.0) kg = (() * v).clip(min=2.0) fg = anchor_fitness(kg) if fg > f: f, k = fg, () = f'{prefix}Evolving anchors with Genetic Algorithm: fitness = {f:.4f}' if verbose: print_results(k) return print_results(k)
The comments section of the code actually explains the parameters and calling methods very clearly, so I'll just briefly mention them here:
Arguments: dataset: datayamltrails n: Number of class clusters img_size: Image size during training(32multiple of sth.) thr: anchorThe aspect ratio threshold of the,Limit the aspect ratio to this threshold value gen: k-meansMaximum number of iterations of the algorithm(If you don't understand it, you can read it.k-meansarithmetic) verbose: Printing parameters Usage: from import *; _ = kmean_anchors()
summarize
To this article on YOLOv5 target detection of anchor settings is introduced to this article, more related to YOLOv5 anchor settings, please search for my previous posts or continue to browse the following related articles I hope you will support me in the future!