PyTorch+LSTM for univariate time series prediction

A time series is any quantifiable measure or event that occurs over a period of time. While this may sound trivial, almost anything can be considered a time series. Your average heart rate per hour for a month, the daily closing price of a stock for a year, the number of traffic accidents per week in a given city for a year.

Recording this information over any period of time is considered a time series. For each of these examples, there is a frequency of events (daily, weekly, hourly, etc.) and a length of time over which the events occur (month, year, day, etc.).

In this tutorial, we will use PyTorch-LSTM for deep learning time series prediction.

Our goal is to receive a sequence of values and predict the next value in that sequence. The easiest way to do this is to use an autoregressive model, and we will focus on using an LSTM to solve this problem.

Data preparation

Let's look at a sample time series. The chart below shows some data on oil prices from 2013 to 2018.

This is just a plot of a single sequence of numbers on a date axis. The table below shows the first 10 entries in this time series. Price data is available for each day.

date dcoilwtico
2013-01-01 NaN
2013-01-02 93.14
2013-01-03 92.97
2013-01-04 93.12
2013-01-07 93.20
2013-01-08 93.21
2013-01-09 93.08
2013-01-10 93.81
2013-01-11 93.60
2013-01-14 94.27

Many machine learning models perform much better on normalized data. The standard way to normalize data is to transform the data so that each column has a mean of 0 and a standard deviation of 1. The following code scikit-learn performs normalization

 from  import StandardScaler  
   
 # Fit scalers  
 scalers = {}  
 for x in :  
   scalers[x] = StandardScaler().fit(df[x].(-1, 1))  
   
 # Transform data via scalers  
 norm_df = ()  
 for i, key in enumerate(()):  
   norm = scalers[key].transform(norm_df.iloc[:, i].(-1, 1))  
   norm_df.iloc[:, i] = norm

We also want the data to have a uniform frequency - in this example there is the price of oil for each day of the 5 years, if this is not the case for your data, Pandas has a few different ways to resample the data to fit the uniform frequency, see previous articles in our public section

For training data we need to intercept the complete time series data into a fixed length sequence. Suppose we have a sequence: [1, 2, 3, 4, 5, 6].

By choosing a sequence of length 3, we can generate the following sequences and their associated targets:

[Sequence] Target
[1, 2, 3] → 4
[2, 3, 4] → 5
[3, 4, 5] → 6

Or we define how many steps we need to backtrack in order to predict the next value. We refer to this value as the training window and the number of values to predict as the prediction window. In this example, they are 3 and 1. The function below details how this is done.

 # Define a function that creates a sequence and a target as shown above
 def generate_sequences(df: , tw: int, pw: int, target_columns, drop_targets=False):  
   '''  
  df: Pandas DataFrame of the univariate time-series  
  tw: Training Window - Integer defining how many steps to look back  
  pw: Prediction Window - Integer defining how many steps forward to predict  
   
  returns: dictionary of sequences and targets for all sequences  
  '''  
   data = dict() # Store results into a dictionary  
   L = len(df)  
   for i in range(L-tw):  
     # Option to drop target from dataframe  
     if drop_targets:  
       (target_columns, axis=1, inplace=True)  
   
     # Get current sequence    
     sequence = df[i:i+tw].values  
     # Get values right after the current sequence  
     target = df[i+tw:i+tw+pw][target_columns].values  
     data[i] = {'sequence': sequence, 'target': target}  
   return data

This allows us to customize the dataset in PyTorch using the Dataset class

 class SequenceDataset(Dataset):  
   
   def __init__(self, df):  
      = df  
   
   def __getitem__(self, idx):  
     sample = [idx]  
     return (sample['sequence']), (sample['target'])  
     
   def __len__(self):  
     return len()

We can then use the PyTorch DataLoader to traverse the data. The nice thing about using the DataLoader is that it does the batching and data mangling internally automatically, so we don't have to implement it ourselves, the code is as follows.

 # Here we define the properties for our model
   
 BATCH_SIZE = 16 # Training batch size  
 split = 0.8 # Train/Test Split ratio  
   
 sequences = generate_sequences(norm_df.dcoilwtico.to_frame(), sequence_len, nout, 'dcoilwtico')  
 dataset = SequenceDataset(sequences)  
   
 # Split the data according to the split ratio and load each subset into a separate DataLoader object
 train_len = int(len(dataset)*split)  
 lens = [train_len, len(dataset)-train_len]  
 train_ds, test_ds = random_split(dataset, lens)  
 trainloader = DataLoader(train_ds, batch_size=BATCH_SIZE, shuffle=True, drop_last=True)  
 testloader = DataLoader(test_ds, batch_size=BATCH_SIZE, shuffle=True, drop_last=True)

In each iteration, the DataLoader will produce 16 (batch size) sequences and their associated targets, which we pass into the model.

model architecture

We will use a single LSTM layer, followed by some linear layers for the regression part of the model, and of course the dropout layer between them. The model will output a single value for each training input.

 class LSTMForecaster():  
  
   def __init__(self, n_features, n_hidden, n_outputs, sequence_len, n_lstm_layers=1, n_deep_layers=10, use_cuda=False, dropout=0.2):  
     '''  
    n_features: number of input features (1 for univariate forecasting)  
    n_hidden: number of neurons in each hidden layer  
    n_outputs: number of outputs to predict for each training example  
    n_deep_layers: number of hidden dense layers after the lstm layer  
    sequence_len: number of steps to look back at for prediction  
    dropout: float (0 < dropout < 1) dropout ratio between dense layers  
    '''  
     super().__init__()  
   
     self.n_lstm_layers = n_lstm_layers  
      = n_hidden  
     self.use_cuda = use_cuda # set option for device selection  
   
     # LSTM Layer  
      = (n_features,  
                         n_hidden,  
                         num_layers=n_lstm_layers,  
                         batch_first=True) # As we have transformed our data in this way  
       
     # first dense after lstm  
     self.fc1 = (n_hidden * sequence_len, n_hidden)  
     # Dropout layer  
      = (p=dropout)  
   
     # Create fully connected layers (n_hidden x n_deep_layers)  
     dnn_layers = []  
     for i in range(n_deep_layers):  
       # Last layer (n_hidden x n_outputs)  
       if i == n_deep_layers - 1:  
         dnn_layers.append(())  
         dnn_layers.append((nhid, n_outputs))  
       # All other layers (n_hidden x n_hidden) with dropout option  
       else:  
         dnn_layers.append(())  
         dnn_layers.append((nhid, nhid))  
         if dropout:  
           dnn_layers.append((p=dropout))  
     # compile DNN layers  
      = (*dnn_layers)  
   
   def forward(self, x):  
   
     # Initialize hidden state  
     hidden_state = (self.n_lstm_layers, [0], )  
     cell_state = (self.n_lstm_layers, [0], )  
   
     # move hidden state to device  
     if self.use_cuda:  
       hidden_state = hidden_state.to(device)  
       cell_state = cell_state.to(device)  
           
      = (hidden_state, cell_state)  
   
     # Forward Pass  
     x, h = (x, ) # LSTM  
     x = (().view([0], -1)) # Flatten lstm out  
     x = self.fc1(x) # First Dense  
     return (x) # Pass forward through fully connected DNN.

We set 2 parameters n_hidden and n_deep_players that can be tuned freely. larger parameters mean more complex models and longer training times, so here we have the flexibility to tune using these two parameters.

The remaining parameters are as follows: sequence_len refers to the training window, and nout defines how many steps are to be predicted; setting sequence_len to 180,and nout to 1 means that the model will look at the situation 180 days (six months) from now to predict what will happen tomorrow.

 nhid = 50 # Number of nodes in the hidden layer  
 n_dnn_layers = 5 # Number of hidden fully connected layers  
 nout = 1 # Prediction Window  
 sequence_len = 180 # Training Window  
   
 # Number of features (since this is a univariate timeseries we'll set  
 # this to 1 -- multivariate analysis is coming in the future)  
 ninp = 1  
   
 # Device selection (CPU | GPU)  
 USE_CUDA = .is_available()  
 device = 'cuda' if USE_CUDA else 'cpu'  
   
 # Initialize the model  
 model = LSTMForecaster(ninp, nhid, nout, sequence_len, n_deep_layers=n_dnn_layers, use_cuda=USE_CUDA).to(device)

model training

After defining the model, we can choose the loss function and optimizer, set the learning rate and the number of cycles, and start our training loop. Since this is a regression problem (i.e. we are trying to predict a continuous value), the simplest and safest loss function is the mean square error. This provides a robust way to calculate the error between the actual value and the value predicted by the model.

The optimizer and loss function are as follows:

 # Set learning rate and number of epochs to train over  
 lr = 4e-4  
 n_epochs = 20  
   
 # Initialize the loss function and optimizer  
 criterion = ().to(device)  
 optimizer = ((), lr=lr)

Here is the code for the training loop: in each training iteration, we will compute the loss of the previously created training and validation sets: the

# Lists to store training and validation losses  
t_losses, v_losses = [], []  
# Loop over epochs  
for epoch in range(n_epochs):  
  train_loss, valid_loss = 0.0, 0.0  
  
  # train step  
  ()  
  # Loop over train dataset  
  for x, y in trainloader:  
    optimizer.zero_grad()  
    # move inputs to device  
    x = (device)  
    y  = ().to(device)  
    # Forward Pass  
    preds = model(x).squeeze()  
    loss = criterion(preds, y) # compute batch loss  
    train_loss += ()  
    ()  
    ()  
  epoch_loss = train_loss / len(trainloader)  
  t_losses.append(epoch_loss)  
    
  # validation step  
  ()  
  # Loop over validation dataset  
  for x, y in testloader:  
    with torch.no_grad():  
      x, y = (device), ().to(device)  
      preds = model(x).squeeze()  
      error = criterion(preds, y)  
    valid_loss += ()  
  valid_loss = valid_loss / len(testloader)  
  v_losses.append(valid_loss)  
        
  print(f'{epoch} - train: {epoch_loss}, valid: {valid_loss}')  
plot_losses(t_losses, v_losses)

This way the model has been trained and can be evaluated for prediction.

inference

We call the trained model to predict the unperturbed data and compare how different the prediction is from the true observation.

def make_predictions_from_dataloader(model, unshuffled_dataloader):  
  ()  
  predictions, actuals = [], []  
  for x, y in unshuffled_dataloader:  
    with torch.no_grad():  
      p = model(x)  
      (p)  
      (())  
  predictions = (predictions).numpy()  
  actuals = (actuals).numpy()  
  return (), actuals

Normalized Forecasts and Actual Prices in Oil History

Our prediction looks pretty good! The prediction is OK, showing that we haven't overfitted the model, so let's see if we can use it to predict the future.

anticipate

If we define history as the sequence preceding the predicted moment, the algorithm is simple.

1. get the latest valid sequence from the history (training window length).

2. Input the latest sequence into the model and predict the next value.

3. Attach the predicted value to the historical record.

4. Repeat step 1 iteratively.

It is important to note here that depending on the parameters chosen when training the model, the longer (farther) you predict, the more likely the model is to show its own bias and start predicting the mean. Therefore, we don't want to always predict too far ahead of the mean if we don't have to, as this will affect the accuracy of the prediction.

This is implemented in the following function: the

def one_step_forecast(model, history):  
      '''  
      model: PyTorch model object  
      history: a sequence of values representing the latest values of the time   
      series, requirement -> len() == 2  
      
      outputs a single value which is the prediction of the next value in the  
      sequence.  
      '''  
      ()  
      ()  
      with torch.no_grad():  
        pre = (history).unsqueeze(0)  
        pred = (pre)  
      return ().numpy().reshape(-1)  
  
  def n_step_forecast(data: , target: str, tw: int, n: int, forecast_from: int=None, plot=False):  
      '''  
      n: integer defining how many steps to forecast  
      forecast_from: integer defining which index to forecast from. None if  
      you want to forecast from the end.  
      plot: True if you want to output a plot of the forecast, False if not.  
      '''  
      history = data[target].copy().to_frame()  
        
      # Create initial sequence input based on where in the series to forecast   
      # from.  
      if forecast_from:  
        pre = list(history[forecast_from - tw : forecast_from][target].values)  
      else:  
        pre = list(history[])[-tw:]  
  
      # Call one_step_forecast n times and append prediction to history  
      for i, step in enumerate(range(n)):  
        pre_ = (pre[-tw:]).reshape(-1, 1)  
        forecast = self.one_step_forecast(pre_).squeeze()  
        (forecast)  
        
      # The rest of this is just to add the forecast to the correct time of   
      # the history series  
      res = ()  
      ls = [ for i in range(len(history))]  
  
      # Note: I have not handled the edge case where the start index + n is   
      # before the end of the dataset and crosses past it.  
      if forecast_from:  
        ls[forecast_from : forecast_from + n] = list((pre[-n:]))  
        res['forecast'] = ls  
         = ['actual', 'forecast']  
      else:  
        fc = ls + list((pre[-n:]))  
        ls = ls + [ for i in range(len(pre[-n:]))]  
        ls[:len(history)] = history[].values  
        res = ([ls, fc], index=['actual', 'forecast']).T  
      return res

Let's take a look at the actual results

We make predictions from different places in the middle of this time series so that we can compare the predictions with what actually happened. Our forecasting program, which can forecast from anywhere for any reasonable number of steps, is indicated by the red line. (These graphs show normalized prices on the y-axis)

Forecast 200 days after the third quarter of 2013

Forecast for the last 200 days of 2014/15

200 days forecast from Q1 2016

200-day projection from the last day of data

summarize

We have a fairly average performance with this model! However, we have presented the whole process of time series forecasting with this example, and we can make the model better and more accurate by trying to tweak the architecture and parameters.

This paper deals only with univariate time series, where there is only one series of values. There are also methods that can use multiple series for forecasting. This is called multivariate time series forecasting and I will cover it in a future article.

To this point this article on PyTorch + LSTM to achieve univariate time series prediction is introduced to this article, more related to PyTorch LSTM time series prediction content please search for my previous articles or continue to browse the following related articles I hope you will support me in the future!