Python+SimpleRNN implementation of stock prediction in detail

For the principle, check out the previous posts.

1. Data sources

It is the daily k-line data of SH600519 Guizhou Moutai downloaded from the tushare module, and only its C-column data is used in this example (as shown in the figure):

Using the opening prices of 60 consecutive days, predict the opening price of the 61st day.

2、Code realization

In accordance with the six-step method: import Related modules -> read Guizhou Maotai daily k-line data to variable maotai, the variable maotai in the first 2126 days of data in the opening price as the training data, variable maotai in the latter 300 days of data in the opening price as the test data; and then the opening price of the opening price of the normalization, so that the data fed into the neural network is distributed in the range of 0 to 1 The opening price is then normalized so that the data fed into the neural network is distributed between 0 and 1;

Next create empty list for receiving training set input features, training set labels, test set input features, test set labels respectively;

Continue to construct the data. Traverse the entire training data with a for loop, every 60 consecutive days of data as the input feature x_train, the 61st day of data as the corresponding label y_train, a total of 2066 sets of training data are generated, and then the order of the training data is disrupted and transformed into an array format, which is then transformed into the dimensionality required by the RNN input;

Similarly, the for loop is used to traverse the entire test data, generating a total of 240 sets of test data. The test set does not need to be out of order, but it needs to be transformed into an array format and then into the dimensions required by the RNN input.

Build a neural network with sequntial:

The first recurrent computation layer memory is set to 80, and each time step pushes h t h_t ht to the next layer, using a Dropout of 0.2;

The second recurrent computation layer is set to have 100 memories, and only the last time step pushes h t h_t ht to the next layer, using a Dropout of 0.2;

Since the output value is only one number for the opening price on day 61, the fully connected Dense is 1->compile Configure the training method to use the adam optimizer using the mean square error loss function. In the stock prediction code, only the loss is observed, and only the loss is printed when the training iteration is printed, so there is no need to assign a value to the metrics here->set breakpoints to continue training, fit performs the training process->summary prints out the structure of the network and parameter statistics.

Perform loss visualization and parameter error reporting

Stock prediction. Predict is used to predict the data in the test set, then the predicted and true values are transformed from the normalized values to the true values, and finally the true value curve is drawn in red and the predicted value curve is drawn in blue.

In order to evaluate the strengths and weaknesses of the model, three rubrics are given: the mean square error, the root mean square error, and the mean absolute error; the smaller these errors are the closer the predicted values are to the true values.

RNN stock prediction loss curve.

RNN Stock Prediction Curves.

RNN Stock Prediction Evaluation Metrics.

Model Summary:

3. Complete code

import numpy as np
import tensorflow as tf
from  import Dropout, Dense, SimpleRNN
import  as plt
import os
import pandas as pd
from  import MinMaxScaler
from  import mean_squared_error, mean_absolute_error
import math
# Read the stock file
maotai = pd.read_csv('./')
# before (2426-300 = 2126) days of the opening price as a training set, the table counts from 0, 2:3 is extracted [2:3) columns, the front closed after the opening, so the extraction of the opening price of the C columns
training_set = [0:2426 - 300, 2:3].values
# Opening price 300 days after as a test set
test_set = [2426 - 300:, 2:3].values

# Normalization
sc = MinMaxScaler(feature_range=(0, 1))  # Define normalization: normalize to between (0, 1)
training_set_scaled = sc.fit_transform(training_set)  # Find the maximum value of the training set, the minimum value of these properties inherent in the training set, and normalize on the training set
test_set = (test_set)  # Normalize the test set using the attributes of the training set

x_train = []
y_train = []

x_test = []
y_test = []

# Test set: first 2426-300 = 2126 days of data in csv table
# Using a for loop, traverse the entire training set, extract the opening price of 60 consecutive days in the training set as the input feature x_train, and the data of the 61st day as the label, the for loop constructs a total of 2426-300-60=2066 sets of data.
for i in range(60, len(training_set_scaled)):
    x_train.append(training_set_scaled[i - 60:i, 0])
    y_train.append(training_set_scaled[i, 0])
# Disrupting the training set
(7)
(x_train)
(7)
(y_train)
.set_seed(7)
# Change the training set from list to array format
x_train, y_train = (x_train), (y_train)

# Make x_train meet RNN input requirements: [number of samples fed, number of cyclic kernel time expansion steps, number of input features per time step].
# Here the entire dataset feed, feed the number of samples for x_train.shape[0] that is, 2066 sets of data; input 60 opening price, predict the opening price of the 61st day, the cycle kernel time to expand the number of steps is 60; each time step to feed the features of a day's opening price, only 1 data, so the number of input features per time step is 1
x_train = (x_train, (x_train.shape[0], 60, 1))
# Test set: 300 days of data in csv tables
# Using a for loop, traverse the entire test set, extract the opening prices of 60 consecutive days in the test set as input feature x_test, and the data of the 61st day as label y_test, the for loop constructs a total of 300-60=240 sets of data.
for i in range(60, len(test_set)):
    x_test.append(test_set[i - 60:i, 0])
    y_test.append(test_set[i, 0])
# test set to array and reshape to meet RNN input requirements: [number of samples fed, number of cyclic kernel time unfolding steps, number of input features per time step]
x_test, y_test = (x_test), (y_test)
x_test = (x_test, (x_test.shape[0], 60, 1))

model = ([
    SimpleRNN(80, return_sequences=True),# First cyclic computation layer: 80 memories are set, each time step pushes ht to the next layer
    Dropout(0.2),   #Dropout using 0.2
    SimpleRNN(100),# Second cyclic computing layer, 100 set memories
    Dropout(0.2),   #
    Dense(1)    # Dense is 1 since the output value is the opening price on day 61, which is only one number
])

(optimizer=(0.001),
              loss='mean_squared_error')  # Loss function with mean square error
# The app only observes the loss value, not the accuracy, so delete the metrics option and display only the loss value at each epoch iteration for a while.

checkpoint_save_path = "./checkpoint/rnn_stock.ckpt"

if (checkpoint_save_path + '.index'):
    print('-------------load the model-----------------')
    model.load_weights(checkpoint_save_path)

cp_callback = (filepath=checkpoint_save_path,
                                                 save_weights_only=True,
                                                 save_best_only=True,
                                                 monitor='val_loss')

history = (x_train, y_train, batch_size=64, epochs=50, validation_data=(x_test, y_test), validation_freq=1,
                    callbacks=[cp_callback])

()

file = open('./', 'w')  # Parameter extraction
for v in model.trainable_variables:
    (str() + '\n')
    (str() + '\n')
    (str(()) + '\n')
()

loss = ['loss']
val_loss = ['val_loss']

(loss, label='Training Loss')
(val_loss, label='Validation Loss')
('Training and Validation Loss')
()
()

################## predict ######################
# Test set input model for prediction
predicted_stock_price = (x_test)
# Reduction of predicted data - back-normalized from (0, 1) to original range
predicted_stock_price = sc.inverse_transform(predicted_stock_price)
# Reduction to real data - back-normalized from (0, 1) to original range
real_stock_price = sc.inverse_transform(test_set[60:])
# Draw a comparison curve between real and predicted data
(real_stock_price, color='red', label='MaoTai Stock Price')
(predicted_stock_price, color='blue', label='Predicted MaoTai Stock Price')
('MaoTai Stock Price Prediction')
('Time')
('MaoTai Stock Price')
()
()

##########evaluate##############
# calculate MSE Mean Square Error ---> E[(Predicted Value - True Value)^2] (Predicted Value minus True Value squared to find the mean)
mse = mean_squared_error(predicted_stock_price, real_stock_price)
# calculate RMSE Root Mean Square Error -->sqrt[MSE] (square the mean square error)
rmse = (mean_squared_error(predicted_stock_price, real_stock_price))
# calculate MAE Mean Absolute Error ----->E[|predicted value - true value|] (predicted value minus true value to find absolute value and then mean)
mae = mean_absolute_error(predicted_stock_price, real_stock_price)
print('Mean Square Error: %.6f' % mse)
print('Root Mean Square Error: %.6f' % rmse)
print('Mean absolute error: %.6f' % mae)

Above is Python + SimpleRNN implementation of stock prediction details, more information about Python SimpleRNN stock prediction please pay attention to my other related articles!