- : Save serialized objects to disk, used Python's pickle for serialization, model, tensor, dictionary of all objects.
- : Uses pickle's unpacking to deserialize the pickled object into memory.
- .load_state_dict: loads the model's parameter dictionary using deserialized state_dict.
state_dict is a Python dictionary that maps each layer to its parameter tensor. Note that only layers with learnable parameters (convolutional layers, fully-connected layers, etc.), as well as registered caches (batchnorm's running averages) are recorded in state_dict. state_dict also contains the optimizer object, storing the state of the optimizer, the hyperparameters used.
A simple example
# Define the model class TheModelClass(): def __init__(self): super(TheModelClass, self).__init__() self.conv1 = nn.Conv2d(3, 6, 5) = nn.MaxPool2d(2, 2) self.conv2 = nn.Conv2d(6, 16, 5) self.fc1 = (16 * 5 * 5, 120) self.fc2 = (120, 84) self.fc3 = (84, 10) def forward(self, x): x = ((self.conv1(x))) x = ((self.conv2(x))) x = (-1, 16 * 5 * 5) x = (self.fc1(x)) x = (self.fc2(x)) x = self.fc3(x) return x # Initialize the model model = TheModelClass() # Initialize the optimizer optimizer = ((), lr=0.001, momentum=0.9) # Print the state_dict of the model print("Model's state_dict:") for param_tensor in model.state_dict(): print(param_tensor, "\t", model.state_dict()[param_tensor].size()) # Print the state_dict of the optimizer print("Optimizer's state_dict:") for var_name in optimizer.state_dict(): print(var_name, "\t", optimizer.state_dict()[var_name])
Output:
Model's state_dict:
([6, 3, 5, 5])
([6])
([16, 6, 5, 5])
([16])
([120, 400])
([120])
([84, 120])
([84])
([10, 84])
([10])Optimizer's state_dict:
state {}
param_groups [{'lr': 0.001, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0, 'nesterov': False, 'params': [4675713712, 4675713784, 4675714000, 4675714072, 4675714216, 4675714288, 4675714432, 4675714504, 4675714648, 4675714720]}]
Save/Load state_dict (recommended)
Save:
(model.state_dict(), PATH)
Loading:
model = TheModelClass(*args, **kwargs) model.load_state_dict((PATH)) ()
Be aware of this detail, if the use uses multiple GPUs on a single computer, then loading the model must also be done first.
When saving the inference process of the model, only the parameters trained by the model need to be saved, and using () to save the state_dict can facilitate the loading of the model. Therefore it is recommended to use this way for model saving.
Remember to always use () to fix the dropout and normalization layers, otherwise each inference will generate a different result.
Note that load_state_dict() needs to pass in a dictionary object, so you need to deserialize state_dict before passing in load_state_dict()
Save/Load entire model
Save:
(model, PATH)
Loading:
# Model classes must be defined elsewhere model = (PATH) ()
This process of saving/loading models uses the most intuitive syntax and uses a small amount of code. This uses Python's pickle to save all modules. The disadvantage of this approach is that saving the modelserializeThe data is bound to a specific class and an exact directory. This is because pickle doesn't save the model class itself, but the path to this class and is used when loading. Therefore, when used in other projects or refactored, there will be an error when loading the model.
Typically, PyTorch models are saved in .pt or .pth file formats.
Always remember to call () to fix dropout and batch normalization when evaluating patterns. Otherwise it will produce inconsistent inference results.
Save the regular Checkpoint loaded for reasoning / or continue training
Save:
({ 'epoch': epoch, 'model_state_dict': model.state_dict(), 'optimizer_state_dict': optimizer.state_dict(), 'loss': loss, ... }, PATH)
Loading:
model = TheModelClass(*args, **kwargs) optimizer = TheOptimizerClass(*args, **kwargs) checkpoint = (PATH) model.load_state_dict(checkpoint['model_state_dict']) optimizer.load_state_dict(checkpoint['optimizer_state_dict']) epoch = checkpoint['epoch'] loss = checkpoint['loss'] () # - or - ()
When saving the regular checkpoints used for inference or continued training, it is important to save other parameters in addition to the model's state_dict. Saving the state_dict of the optimizer is also very important because it contains the cache and parameters of the optimizer while the model is being trained. In addition to this, it is also important to save the number of epochs when training was stopped, the latest model loss, extra layers, etc.
To save multiple components, put them into a dictionary and then serialize this dictionary using (). Generally, use the .tar file format to save these checkpoints.
To load each component, first initialize the model and optimizer, then use () to load the saved dictionary, then you can query the values in the dictionary directly to get the saved components.
Again, be sure not to forget to call () when evaluating the model.
Save multiple models to one file
Save:
({ 'modelA_state_dict': modelA.state_dict(), 'modelB_state_dict': modelB.state_dict(), 'optimizerA_state_dict': optimizerA.state_dict(), 'optimizerB_state_dict': optimizerB.state_dict(), ... }, PATH)
Loading:
modelA = TheModelAClass(*args, **kwargs) modelB = TheModelBClass(*args, **kwargs) optimizerA = TheOptimizerAClass(*args, **kwargs) optimizerB = TheOptimizerBClass(*args, **kwargs) checkpoint = (PATH) modelA.load_state_dict(checkpoint['modelA_state_dict']) modelB.load_state_dict(checkpoint['modelB_state_dict']) optimizerA.load_state_dict(checkpoint['optimizerA_state_dict']) optimizerB.load_state_dict(checkpoint['optimizerB_state_dict']) () () # - or - () ()
When the saved model contains more than one, such as a GAN, a sequence-sequence model, or a combined model, the models are saved using the same way as regular checkpoints are saved. That is, save each model's state_dict and corresponding optimizer to a dictionary. We can save anything that will help us continue training into this dictionary.
Use other models to warm up the current model
Save:
(modelA.state_dict(), PATH)
Loading:
modelB = TheModelBClass(*args, **kwargs) modelB.load_state_dict((PATH), strict=False)
When migration learning or training new complex models, it is common to load part of the model. Utilizing trained parameters, even if only a few are available, will help warm up the training process and allow the model to converge faster.
When loading some of the model parameters for pre-training, it is likely that you will run into key mismatches (the model weights are all saved and loaded back as key-value pairs). Therefore, either in the case of missing keys or extra keys, the mismatched keys can be ignored by setting the strict parameter to False in the load_state_dict() function.
If you want to load parameters from one layer to other layers, but some of the keys don't match, then modifying the key of the parameter in state_dict can solve the problem.
Save and Load Models Across Devices
Saved on GPU, loaded on CPU
Save:
(model.state_dict(), PATH)
Loading:
device = ('cpu') model = TheModelClass(*args, **kwargs) model.load_state_dict((PATH, map_location=device))
When loading a model trained on a GPU on a CPU, specify map_location=('cpu') in (), at which point map_location dynamically remaps the underlying storage of tensors to the CPU device.
The above code only works if the model is trained on one GPU, if the model is trained on multiple GPUs then you will get an error similar to the following when loading on the CPU:
KeyError: ‘unexpected key “module.” in state_dict'
The reason for this is that when using multiple GPUs to train and save a model, the parameter names of the model take on a module prefix, so you can remove this prefix from the key when loading the model:
# Files originally saved through DataParallel state_dict = ('') # Create a new OrderedDict that does not contain `module.` from collections import OrderedDict new_state_dict = OrderedDict() for k, v in state_dict.items(): name = k[7:] # Remove `module.` new_state_dict[name] = v # Load parameters model.load_state_dict(new_state_dict)
Save on GPU, Load on GPU
Save:
(model.state_dict(), PATH)
Loading:
device = ("cuda") model = TheModelClass(*args, **kwargs) model.load_state_dict((PATH)) (device) # Don't forget to call input = (device) on any tensor when feeding data into the model.
When loading a model trained on the GPU onto the GPU, simply use (('cuda')) to convert the initialized model to a CUDA optimized model. Also make sure to use .to(('cuda')) on all inputs to the model. Note that calling my_tensor.to(device) will return a copy of my_tensor on the GPU. It will not overwrite the original my_tensor, so remember to rewrite the tensor manually: my_tensor = my_tensor.to(('cuda')).
Saved on CPU, loaded on GPU
Save:
(model.state_dict(), PATH)
Loading:
device = ("cuda") model = TheModelClass(*args, **kwargs) model.load_state_dict((PATH, map_location="cuda:0")) # Select the GPU you wish to use (device)
Save model
Save:
(.state_dict(), PATH)
summarize
This article on PyTorch model saving and loading is introduced to this article, more related to PyTorch model saving and loading content, please search for my previous articles or continue to browse the following related articles I hope you will support me more in the future!