YOLOv5 target detection of anchor settings

preamble

The yolo algorithm, as a leader in the one-stage field, adopts an anchor-based approach for target detection, using anchors of different scales to directly regress the target box and output the position and category confidence of the target box at once.

Why use ANCHOR for detection?

The initial training process of the original YOLOv1 was very unstable. During the design of YOLOv2, the authors observed the ground truth of a large number of images and found that the target instances of the same category have similar gt aspect ratios: e.g., for cars, the gt are all short and fat rectangles; e.g., for pedestrians, the gt are all thin and tall rectangles. So the authors are inspired by this and pre-prepare a few bounding boxes from the dataset with larger odds ratios, and then use them as benchmarks for prediction.

The anchor detection process

First of all, the size of the input image for the coco dataset used in yolov5 is 640x640, but the input size for the training process is not unique, because v5 can use masaic enhancement to compose parts of 4 images into an input image of a certain size. But if you need to use pre-training weights, it is better to resize the input image to the same size as the author, and the input image size must be a multiple of 32, which is related to the following stage of ANCHOR detection.

Above is my own drawing of the network structure of v5 v6.0. When our input size is 640*640, we will get 3 different scales of output: 80x80 (640/8), 40x40 (640/16), 20x20 (640/32), which is the output of the CSP2_3 module in the above diagram.

anchors:
  - [10,13, 16,30, 33,23]  # P3/8
  - [30,61, 62,45, 59,119]  # P4/16
  - [116,90, 156,198, 373,326]  # P5/32

Among them, 80x80 represents the shallow feature map (P3), which contains more low-level information and is suitable for detecting small targets, so the anchor scale used for this feature map is smaller; similarly, 20x20 represents the deep feature map (P5), which contains more high-level information such as contours, structures, etc., and is suitable for detecting large targets, so the anchor scale used for this feature map is larger. anchor scale is larger. The other 40x40 feature map (P4) uses an anchor between these two scales for detecting medium sized targets.This idea of using different scales of anchors for different feature maps is credited for yolov5's ability to detect cross-scale targets efficiently and quickly.

These are the specific explanations of anchors in yolov5.

The anchor generation process

For most of the images, the resize is done at the input stage because their size does not match our predefined input size, resulting in a change in the pre-labeled bounding box size as well. The anchor is calculated based on the size of the bounding box in our input network, so there is an anchor re-clustering process in this resize process. In theyolov5/utils/Under the file, there is a function kmeans_anchor, which calculates the anchor by the method of kmeans. as follows:

def kmean_anchors(dataset='./data/', n=9, img_size=640, thr=4.0, gen=1000, verbose=True):
    """ Creates kmeans-evolved anchors from training dataset

        Arguments:
            dataset: path to , or a loaded dataset
            n: number of anchors
            img_size: image size used for training
            thr: anchor-label wh ratio threshold hyperparameter hyp['anchor_t'] used for training, default=4.0
            gen: generations to evolve anchors using genetic algorithm
            verbose: print all results

        Return:
            k: kmeans evolved anchors

        Usage:
            from  import *; _ = kmean_anchors()
    """
    from  import kmeans

    thr = 1. / thr
    prefix = colorstr('autoanchor: ')

    def metric(k, wh):  # compute metrics
        r = wh[:, None] / k[None]
        x = (r, 1. / r).min(2)[0]  # ratio metric
        # x = wh_iou(wh, (k))  # iou metric
        return x, (1)[0]  # x, best_x

    def anchor_fitness(k):  # mutation fitness
        _, best = metric((k, dtype=torch.float32), wh)
        return (best * (best > thr).float()).mean()  # fitness

    def print_results(k):
        k = k[((1))]  # sort small to large
        x, best = metric(k, wh0)
        bpr, aat = (best > thr).float().mean(), (x > thr).float().mean() * n  # best possible recall, anch > thr
        print(f'{prefix}thr={thr:.2f}: {bpr:.4f} best possible recall, {aat:.2f} anchors past thr')
        print(f'{prefix}n={n}, img_size={img_size}, metric_all={():.3f}/{():.3f}-mean/best, '
              f'past_thr={x[x > thr].mean():.3f}-mean: ', end='')
        for i, x in enumerate(k):
            print('%i,%i' % (round(x[0]), round(x[1])), end=',  ' if i < len(k) - 1 else '\n')  # use in *.cfg
        return k

    if isinstance(dataset, str):  # *.yaml file
        with open(dataset, errors='ignore') as f:
            data_dict = yaml.safe_load(f)  # model dict
        from datasets import LoadImagesAndLabels
        dataset = LoadImagesAndLabels(data_dict['train'], augment=True, rect=True)

    # Get label wh
    shapes = img_size *  / (1, keepdims=True)
    wh0 = ([l[:, 3:5] * s for s, l in zip(shapes, )])  # wh

    # Filter
    i = (wh0 < 3.0).any(1).sum()
    if i:
        print(f'{prefix}WARNING: Extremely small objects found. {i} of {len(wh0)} labels are < 3 pixels in size.')
    wh = wh0[(wh0 >= 2.0).any(1)]  # filter > 2 pixels
    # wh = wh * (([0], 1) * 0.9 + 0.1)  # multiply by random scale 0-1

    # Kmeans calculation
    print(f'{prefix}Running kmeans for {n} anchors on {len(wh)} points...')
    s = (0)  # sigmas for whitening
    k, dist = kmeans(wh / s, n, iter=30)  # points, mean distance
    assert len(k) == n, f'{prefix}ERROR:  requested {n} points but returned only {len(k)}'
    k *= s
    wh = (wh, dtype=torch.float32)  # filtered
    wh0 = (wh0, dtype=torch.float32)  # unfiltered
    k = print_results(k)

    # Plot
    # k, d = [None] * 20, [None] * 20
    # for i in tqdm(range(1, 21)):
    #     k[i-1], d[i-1] = kmeans(wh / s, i)  # points, mean distance
    # fig, ax = (1, 2, figsize=(14, 7), tight_layout=True)
    # ax = ()
    # ax[0].plot((1, 21), (d) ** 2, marker='.')
    # fig, ax = (1, 2, figsize=(14, 7))  # plot wh
    # ax[0].hist(wh[wh[:, 0]<100, 0],400)
    # ax[1].hist(wh[wh[:, 1]<100, 1],400)
    # ('', dpi=200)

    # Evolve
    npr = 
    f, sh, mp, s = anchor_fitness(k), , 0.9, 0.1  # fitness, generations, mutation prob, sigma
    pbar = tqdm(range(gen), desc=f'{prefix}Evolving anchors with Genetic Algorithm:')  # progress bar
    for _ in pbar:
        v = (sh)
        while (v == 1).all():  # mutate until a change occurs (prevent duplicates)
            v = (((sh) < mp) * () * (*sh) * s + 1).clip(0.3, 3.0)
        kg = (() * v).clip(min=2.0)
        fg = anchor_fitness(kg)
        if fg > f:
            f, k = fg, ()
             = f'{prefix}Evolving anchors with Genetic Algorithm: fitness = {f:.4f}'
            if verbose:
                print_results(k)

    return print_results(k)

The comments section of the code actually explains the parameters and calling methods very clearly, so I'll just briefly mention them here:

Arguments:
          dataset: datayamltrails
          n: Number of class clusters
          img_size: Image size during training（32multiple of sth.）
          thr: anchorThe aspect ratio threshold of the，Limit the aspect ratio to this threshold value
          gen: k-meansMaximum number of iterations of the algorithm（If you don't understand it, you can read it.k-meansarithmetic）
          verbose: Printing parameters

Usage:
          from  import *; _ = kmean_anchors()

summarize

To this article on YOLOv5 target detection of anchor settings is introduced to this article, more related to YOLOv5 anchor settings, please search for my previous posts or continue to browse the following related articles I hope you will support me in the future!