Pytorch Auto Derivative Functions Detailed Flow and Comparison with TensorFlow to Build Networks

I. Defining a new automatic derivation function

At the bottom, each primitive autoderivative operation is actually two functions running on a Tensor. Where the forward function computes the output Tensors obtained from the input Tensor. and the backward function receives the output, Tensors for some scalar value gradient, and computes the input Tensors relative to that same scalar value gradient.
In Pytorch it is easy to define the autogradient function by defining subclasses that implement the forward and backward functions. After that it is possible to use this new automatic gradient operator. We can use the new auto-derivative operator by constructing an instance and calling the function, passing in a tensor containing the input data to call it, thus
For the following example, customize an autoderivative function to show the nonlinearity of ReLU and call it to implement a two-layer network, as in the previous section

import torch
class myrelu():#Customized subclasses
    # Customized autograd functions via created subclasses, complete with forward and backward propagation of tensor
    @staticmethod
    def forward(ctx, x):
        # In forward propagation, a context object and a tensor containing the inputs are accepted, a tensor containing the outputs must be returned, and the context object can be used to cache the objects for use in backpropagation
        ctx.save_for_backward(x)
        return (min=0)
    @staticmethod
    def backward(ctx, grad_output):
        """
        In backpropagation, we receive the context object and a tensor that
        which contains the gradient of the loss relative to the output produced during forward propagation.
        We can retrieve the cached data from the context object
        and must compute and return the gradient of the loss with respect to the input of the forward propagation.
        """
        x, = ctx.saved_tensors
        grad_x = grad_output.clone()
        grad_x[x < 0] = 0
        return grad_x

Calling a custom class to implement a two-tier network

#%%
device=('cuda' if .is_available() else 'cpu')
# n is the batch size and d_in is the input dimension
# h is the hidden dimension and d_out is the output dimension
n,d_in,h,d_out=64,1000,100,10
# Create randomized input and output data, requires_grad is set to False by default, indicating that no post-differential operation is required.
x=(n,d_in,device=device)
y=(n,d_out,device=device)
# Randomly initialize the weights, requires_grad is set to True by default, indicating that you want to compute their differentials.
w1=(d_in,h,device=device,requires_grad=True)
w2=(h,d_out,device=device,requires_grad=True)

learning_rate=1e-6
for i in range(500):
    #Forward propagation, using operations on the tensor to compute the prediction y
  # Call customized functions
    y_pred=((w1)).mm(w2)
       
    
    #Calculate the loss value using the operation in tensor, () to get the value corresponding to the tensor of loss
    loss=(y_pred-y).pow(2).sum()
    print(i,())
    
    #Calculate the backpropagation using autograd, this call will calculate the tensor gradient of the loss for all requisites_grad=True.
    # After the call, and will be the gradient tensor of loss for w1 and w2, respectively
    ()
    #Using gradient descent to update weights, only want to make in-place changes to w1 and w2 values: don't want to update build computation graphs
    #so use torch.no_grad() to prevent pytorch from updating the build computation graphs
    with torch.no_grad():
        w1-=learning_rate*
        w2-=learning_rate*
        # Manually set the gradient to zero after backpropagation
        .zero_()
        .zero_()

running result

在这里插入图片描述

…

在这里插入图片描述

II. Comparison of Pytorch and TensorFlow

PyTorch autoderivative looks very much like TensorFlow: in both frameworks, computational graphs are defined to compute the gradient using automatic differentiation; the big difference between the two is that TensorFlow's computational graphs are static, whereas PyTorch uses dynamic computational graphs.
In TensorFlow, defining a computational graph once and then executing the same graph repeatedly may provide different input data, whereas in PyTorch, a new computational graph is defined for each forward channel.
** The benefit of static graphs is that they can be pre-optimized. **E.g., a framework can fuse a number of graph operations to improve efficiency, or produce a strategy to distribute the graph across multiple GPUs or machines. But if the same graph is used over and over again, the consumption of potentially costly pre-optimization up front is spread out when running the same graph over and over again.
One difference between static and dynamic graphs is control flow. For some models, different computations are performed for each data point. E.g., a recurrent neural network may perform different time steps for each data point, and this unfolding can be implemented as a loop. For a static graph, the loop structure has to be part of the graph. Therefore, TensorFlow provides operators to embed loops into the graph. For dynamic graphs, the situation is even simpler: create the graph on-the-fly for each example, and use plain old imperative control flow to perform a different computation for each input.

Fit a simple two-layer network using TensorFlow (compare above):

#%% Using TensorFlow
import .v1 as tf   #Whatever it takes to use placeholder
tf.disable_v2_behavior()
import numpy as np
#%%
# Create calculation maps

# n is the batch size and d_in is the input dimension
# h is the hidden dimension and d_out is the output dimension
n,d_in,h,d_out=64,1000,100,10

# Create placeholders for input and target data, which will be populated with real data when the computational graph is executed
x=(tf.float32,shape=(None,d_in))
y=(tf.float32,shape=(None,d_out))

# Create variables for the weights and initialize them with random data, TensorFlow's variables don't change when executing the computed graphs
w1 = (tf.random_normal((d_in,h)))
w2=(tf.random_normal((h,d_out)))
# Forward propagation: use TensorFlow's tensor operations to compute the prediction y (this code does not perform any numerical operations, it just builds a graph of the computations to be performed later)
h=(x,w1)
h_relu=(h,(1))
y_pred=(h_relu,w2)
# tensor computing lossloss with TensorFlow
loss=tf.reduce_sum((y-y_pred)**2.0)
# Calculate the derivative of loss with respect to the weights w1 and w2
grad_w1,grad_w2=(loss,[w1,w2])
# Use gradient descent to update the weights, to actually update the weights we need to compute new_w1 and new_w2 when performing the computation of the graphs
# Note: In TensorFlow, the act of updating weight values is part of the computational graph, but in Pytorch it occurs outside of the computational graph
learning_rate=1e-6
new_w1=(w1-learning_rate*grad_w1)
new_w2=(w2-learning_rate*grad_w2)

# Now that the computational graph is built, start a TensorFlow talkback to execute the computational graph
with () as sess:
    # Operate a computational graph to account for variables w1 and w2 #
    (tf.global_variables_initializer())
    # Create numpy arrays to store the actual data for input x and target y
    x_value=(n,d_in)
    y_value=(n,d_out)
    for i in range(500):
        # Multiple runs to compute the graph, with the feed_dict parameter on each execution.
        # Bind x_value to x and y_value to y. Calculate the loss each time the computational graph is executed.
        # new_w1 and new_w2, the values of these tensors are returned as numpy arrays
        loss_value,i,i=([loss,new_w1,new_w2],
                                feed_dict={x:x_value,y:y_value})
        print(loss_value)

running result

在这里插入图片描述

…

在这里插入图片描述

Today's conclusion focuses on comparing the differences between TensorFlow and Pytorch in automatic derivation - computational graphs are static in the former and dynamic in the latter.
Bye bye la, tomorrow may not be more ~ because there are classes in the afternoon and evening, although I may not go to them (hahahahahahahahahahahahaha, don't learn from me) the latter section to write about neural networks, see you there!

To this point, this article on Pytorch autoderivative function to explain the process and build a network with TensorFlow comparison of the article is introduced to this, more related Pytorch autoderivative function content, please search for my previous articles or continue to browse the following related articles I hope that you will support me more in the future!