What is label smoothing? How do you go about using it in PyTorch?
Overfitting and probability calibration are two common problems in the process of training deep learning models. On the one hand, regularization techniques can solve the overfitting problem, in which the more common methods include reducing the weights, stopping the iteration early and discarding some weights. On the other hand, the Platt scale method and isotonic regression method can calibrate the model. But is there a way to address both overfitting and model overconfidence?
Labeled smoothing might work. It is a regularization technique that de-alters the target variable and can make the model's prediction no longer just a deterministic value. Label smoothing is seen as a regularization technique because it prevents the maximum logits value input to the softmax function from becoming exceptionally large, thus making the classification model more accurate.
In this article, we define label smoothing, which we apply to the cross-entropy loss function during testing.
Labeling smooth?
Suppose here we have a multicategorization problem in which the target variable is usually a one-hot vector, i.e., the result is 1 when in the correct classification, and 0 otherwise.
Label smoothing changes the minimum value of the target vector so that it is ε. Thus, when the model is classified, the result is no longer just 1 or 0, but 1-ε and ε as we require, and thus the cross-entropy loss function with label smoothing is given by the following equation.
In this formulation, ce(x) denotes the standard cross-entropy loss function for x, e.g., -log(p(x)), ε is a very small positive number, i denotes the corresponding correct classification, and N is the number of all classifications.
Intuitively, labeled smoothing restricts the logit value of the correct class and makes it closer to the logit values of the other classes. Thus, to some extent, it is treated as a regularization technique and a way to counter model overconfidence.
Usage in PyTorch
The cross-entropy loss function with label smoothing is very simple to implement in PyTorch. First, let's use an auxiliary function to compute a linear combination between two values.
deflinear_combination(x, y, epsilon):return epsilon*x + (1-epsilon)*y
Next, we use thePyTorch
A completely new loss function in the.
import as F defreduce_loss(loss, reduction='mean'):return () if reduction=='mean'else () if reduction=='sum'else loss classLabelSmoothingCrossEntropy():def__init__(self, epsilon:float=0.1, reduction='mean'): super().__init__() = epsilon = reduction defforward(self, preds, target): n = ()[-1] log_preds = F.log_softmax(preds, dim=-1) loss = reduce_loss(-log_preds.sum(dim=-1), ) nll = F.nll_loss(log_preds, target, reduction=) return linear_combination(loss/n, nll, )
We can now remove this class from our code. For this example, we use the standard pets example
.
from import * from import error_rate # prepare the data path = untar_data() path_img = path/'images' fnames = get_image_files(path_img) bs = 64 (2) pat = r'/([^/]+)_\d+.jpg$' data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=224, bs=bs) \ .normalize(imagenet_stats) # train the model learn = cnn_learner(data, models.resnet34, metrics=error_rate) learn.loss_func = LabelSmoothingCrossEntropy() learn.fit_one_cycle(4)
Finally, the data is converted into a format that can be used by the model, and the ResNet architecture is chosen and the cross-entropy loss function with label smoothing is used as the optimization objective. After four rounds of loops, the results are as follows
The error rate of our results is only 7.5%, which is perfectly acceptable for about 10 lines of code, and most of the parameters in the model are also chosen as default settings.
Therefore, there are many more parameters in the model that can be adjusted to make the model perform better, e.g., different optimizers, hyperparameters, model architecture, etc. can be used.
reach a verdict
In this post, we learned what label smoothing is and when to go about using it, and we also learned how to implement it in PyTorch. After that, we trained an advanced computer vision model to recognize different breeds of cats and dogs using only ten lines of code.
Model regularization and model calibration are two important concepts. If you want to be a deep learning veteran, you should have a good understanding of these tools that can combat overfitting and model overconfidence.
Author Bio: Dimitris Poulopoulos, is a machine learning researcher at BigDataStack and a PhD from the University of Piraeus, Greece. He has designed AI-related software for clients such as the European Commission, Eurostat, the International Monetary Fund, and the European Central Bank.
summarize
To this article on how to use tags in PyTorch smooth regularization of the article is introduced to this, more related to PyTorch regularization content, please search for my previous articles or continue to browse the following related articles I hope you will support me in the future!