PyTorch model onnx file export and call details

preamble

The Open Neural Network Exchange (ONNX, Open Neural Network Exchange) format, a standard for representing deep learning models, allows models to be transferred between different frameworks

The model defined by PyTorch is a dynamic graph whose forward propagation is defined and implemented by the class method

However, Python code is inefficient. Imagine converting a dynamic graph to a static graph, the inference speed of the model should be improved.

In the PyTorch framework, you can export a model with a parent class of onnx to an onnx file.

There are three most important parameters:

model：A model whose parent class is
args：The list of variables passed into the forward method of the model should be of type
tuplef：onnx String of file names

import torch
from  import resnet50
 
file = ''
# Declare the model
resnet = resnet50(pretrained=False).eval()
image = ([1, 3, 224, 224])
# Export as onnx file
(resnet, (image,), file)

The onnx file can be opened by Netron to view the model structure

basic usage

To run onnx models in Python, you need to download onnxruntime

# One or the other will do
pip install onnxruntime        # CPU version
pip install onnxruntime-gpu    # GPU releases

Reasoning is done with the help of one of the InferenceSession, of which the more important instance methods are:

get_inputs()：Get a list of input variables (variable attributes: name, shape, type).
get_outputs()：get list of input variables (variable attributes: name, shape, type) run(output_names, input_feed): input variables are (note that dtype should be float32), use model inference and return output

The basic usage of the onnx model can be derived:

import onnxruntime as ort
import numpy as np
file = ''
# Find GPU / CPU
provider = ort.get_available_providers()[
    1 if ort.get_device() == 'GPU' else 0]
print('Equipment:', provider)
# Declare the onnx model
model = (file, providers=[provider])
# Reference.
for node_list in model.get_inputs(), model.get_outputs():
    for node in node_list:
        attr = {'name': ,
                'shape': ,
                'type': }
        print(attr)
    print('-' * 60)
 
# Get the names of the input and output nodes
input_node_name = model.get_inputs()[0].name
ouput_node_name = [ for node in model.get_outputs()]
image = ([1, 3, 224, 224]).astype(np.float32)
print((output_names=ouput_node_name,
                input_feed={input_node_name: image}))

Advanced API

To simplify the usage steps, encapsulation is done using classes:

class Onnx_Module():
    ''' onnx inference model
        provider: prioritize GPUs '''
    provider = ort.get_available_providers()[
        1 if ort.get_device() == 'GPU' else 0]
 
    def __init__(self, file):
        super(Onnx_Module, self).__init__(file, providers=[])
        # Reference.
         = [node_arg.name for node_arg in self.get_inputs()]
         = [node_arg.name for node_arg in self.get_outputs()]
 
    def __call__(self, *arrays):
        input_feed = {name: x for name, x in zip(, arrays)}
        return (, input_feed)

In PyTorch, for a convolutional neural network model with an image image, the code for inference is "model(image)", and the class using this wrapper is similar:

import numpy as np
file = ''
model = Onnx_Module(file)
image = ([1, 3, 224, 224]).astype(np.float32)
print(model(image))

To make it easier to observe the speed difference between the Torch model and the onnx model, and to check whether the outputs of the two models are consistent, the test function has been written again.

The parameters of the test method are the same as those of the test method, and the basic procedure is as follows:

Get the output of the Torch model and print to infer the elapsed time
Export the Torch model as an onnx file, transforming the input variables in the
Initialize the onnx model, get the output of the onnx model, and print the inferred elapsed time
Calculate the mean of the absolute errors of the Torch model and the onnx model outputs
Return the onnx model

class Timer:
    repeat = 3
 
    def __new__(cls, fun, *args, **kwargs):
        import time
        start = ()
        for _ in range(): fun(*args, **kwargs)
        cost = (() - start) / 
        return cost * 1e3  # ms
 
 
class Onnx_Module():
    ''' onnx inference model
        provider: prioritize GPUs '''
    provider = ort.get_available_providers()[
        1 if ort.get_device() == 'GPU' else 0]
 
    def __init__(self, file):
        super(Onnx_Module, self).__init__(file, providers=[])
        # Reference.
         = [node_arg.name for node_arg in self.get_inputs()]
         = [node_arg.name for node_arg in self.get_outputs()]
    def __call__(self, *arrays):
        input_feed = {name: x for name, x in zip(, arrays)}
        return (, input_feed)
 
    @classmethod
    def test(cls, model, args, file, **export_kwargs):
        # Test Torch's runtime
        torch_output = model(*args).()
        print(f'Torch: {Timer(model, *args):.2f} ms')
        # model: Torch -> onnx
        (model, args, file, **export_kwargs)
        # data: tensor -> array
        args = tuple(map(lambda tensor: (), args))
        onnx_model = cls(file)
        # Test onnx runtime
        onnx_output = onnx_model(*args)
        print(f'Onnx: {Timer(onnx_model, *args):.2f} ms')
        # Calculate the absolute error between the Torch model and the onnx model outputs
        abs_error = (torch_output - onnx_output).mean()
        print(f'Mean Error: {abs_error:.2f}')
        return onnx_model

For ResNet50 , the Torch model takes 172.67 ms to infer and the onnx model takes 36.56 ms to infer, and the onnx model takes only 21.17% as long as the Torch model.

To this point this article on the PyTorch model onnx file export and call details of the article is introduced to this, more related to PyTorch file export content, please search for my previous articles or continue to browse the following related articles I hope you will support me more in the future!