basic model
The main feature of RNN is that the output content in the hidden layer of the DNN is stored and can be given as input to the next neuron.
As shown in the figure below, when the word "Taipei" is input, the previous word may be "leave" or "arrive", so if the hidden layer content obtained from the last input of "leave" is input to the next layer, it is possible to distinguish whether it is "leave Taipei" or "arrive Taipei". If you input the hidden layer of content from the previous input of "leave" to the next layer, then it is possible to distinguish whether it is "leaving Taipei" or "arriving in Taipei".
If the hidden layer stores the content and gives it to the next use, it is called the Elman Network.
If the final output is to be used for the next time, it's called the Jordan Network.
Bidirectional RNN: The storage content obtained from the forward inputs, and the storage content obtained from the reverse inputs are fed to the model at the same time.
Long Short-term Memory, in fact, when people talk about the use of RNN, usually use the LSTM. for each LSTM unit, in addition to the input data, there are three other "gates" to control the input, output, storage. As shown in the figure below, each LSTM cell has 4 inputs and 1 output.
These controlled gates are all vectors and after the input they all need to be transformed by a sigmoid function, so after the input and the gate are computed, the output obtained is the data located between 0-1, so that the control over the input, the output, and the storage or not can be realized. And all the parameters of the gate need to be learned by the RNN.
3. Process structure
As follows, if the input is z, which is converted by a function to get g(z), and the input gate data z(i), which is converted by a sigmoid function to multiply it to get g(z) * f(z(i))
Similarly, when the result obtained by the gate controlling the number of stored numbers is 1, then the previous data will be multiplied by 1 and added to the result of the previous calculation, thus realizing the utilization of the last stored data. When the result is 0, the previous data is deleted and the stored data is formatted.
The final output obtained is h(c), but this output cannot be output if the result of the computation after the input z0 to the output gate is 0. It is 1 that can be output.
The difference between LSTM and DNN is that the neurons are replaced with LSTM units, and the input data is multiplied by weights to control the individual gates. Thus the parameters become four times the usual DNN parameters.
A simplified representation of the entire process is as follows.
In practice, the LSTM is not just one, it is a combination of several, and the output of each one, and what is stored is also added to the next input. This is shown in the figure below:
Above is the overall structure of LSTM.
Python Artificial Intelligence Deep Learning RNN model flow structure of the details of today for you to explain here, more information about the RNN model flow structure please pay attention to my other related articles!