Recurrent Neural Network


By Prof. Seungchul Lee
http://iai.postech.ac.kr/
Industrial AI Lab at POSTECH

Table of Contents

0. Video LecturesĀ¶

InĀ [2]:
%%html
<center><iframe src="https://www.youtube.com/embed/pxPu2IdnEiE?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
InĀ [3]:
%%html
<center><iframe src="https://www.youtube.com/embed/SwkS73r4J5c?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
InĀ [4]:
%%html
<center><iframe src="https://www.youtube.com/embed/DkWE9FonbO0?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>

1. Recurrent Neural Network (RNN)Ā¶

  • RNNs are a family of neural networks for processing sequential data

1.1. Feedforward Network and Sequential DataĀ¶



  • Separate parameters for each value of the time index
  • Cannot share statistical strength across different time indices



1.2. Representation ShortcutĀ¶

  • Input at each time is a vector
  • Each layer has many neurons
  • Output layer too may have many neurons
  • But will represent everything simple boxes
  • Each box actually represents an entire layer with many units



1.3. An Alternate Model for Infinite Response SystemsĀ¶

  • The state-space model
$$ \begin{align*} h_t &= f(x_t, h_{t-1})\\ y_t &= g(h_t) \end{align*} $$
  • This is a recurrent neural network
  • State summarizes information about the entire past
  • Single Hidden Layer RNN (Simplest State-Space Model)



  • Multiple Recurrent Layer RNN



  • Recurrent Neural Network
    • Simplified models often drawn
    • The loops imply recurrence



2. LSTM NetworksĀ¶

2.1. Long-Term DependenciesĀ¶

  • Gradients propagated over many stages tend to either vanish or explode
  • Difficulty with long-term dependencies arises from the exponentially smaller weights given to long-term interactions
  • Introduce a memory state that runs through only linear operators
  • Use gating units to control the updates of the state



Example: "I grew up in Franceā€¦ I speak fluent French."



2.2. Long Short-Term Memory (LSTM)Ā¶

  • Consists of a memory cell and a set of gating units
    • Memory cell is the context that carries over
    • Forget gate controls erase operation
    • Input gate controls write operation
    • Output gate controls the read operation




  • Connect LSTM cells in a recurrent manner
  • Train parameters in LSTM cells

2.2.1. LSTM for ClassificationĀ¶



2.2.2. LSTM for PredictionĀ¶



3. LSTM with TensorFlowĀ¶

  • An example for predicting a next piece of an image
  • Regression problem
  • Again, MNIST dataset
  • Time series data and RNN



3.1. Import LibraryĀ¶

InĀ [1]:
# to use GPU

import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
InĀ [2]:
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
from six.moves import cPickle

3.2. Load MNIST DataĀ¶

  • Import acceleration signal of rotation machinery

  • Download files

InĀ [3]:
data =  cPickle.load(open('./data_files/rnn_time_signal.pkl', 'rb'))

plt.figure(figsize = (10,6))
plt.title('Time signal for RNN', fontsize=15)
plt.plot(data[0:2000])
plt.xlim(0,2000)
plt.show()

3.3. LSTM Model TrainingĀ¶



InĀ [4]:
n_step = 25
n_input = 100

# LSTM shape
n_lstm1 = 100
n_lstm2 = 100

# fully connected
n_hidden = 100
n_output = 100
InĀ [5]:
lstm_network = tf.keras.models.Sequential([
    tf.keras.layers.Input(shape = (n_step, n_input)),
    tf.keras.layers.LSTM(n_lstm1, return_sequences = True),
    tf.keras.layers.LSTM(n_lstm2),
    tf.keras.layers.Dense(n_hidden),
    tf.keras.layers.Dense(n_output),
])

lstm_network.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm (LSTM)                  (None, 25, 100)           80400     
_________________________________________________________________
lstm_1 (LSTM)                (None, 100)               80400     
_________________________________________________________________
dense (Dense)                (None, 100)               10100     
_________________________________________________________________
dense_1 (Dense)              (None, 100)               10100     
=================================================================
Total params: 181,000
Trainable params: 181,000
Non-trainable params: 0
_________________________________________________________________
InĀ [6]:
lstm_network.compile(optimizer = 'adam', 
                     loss = 'mean_squared_error', 
                     metrics = ['mse'])
InĀ [7]:
def dataset(data, n_samples, n_step = n_step, dim_input = n_input, dim_output = n_output, stride = 5):
    
    train_x_list = []
    train_y_list = []
    for i in range(n_samples):
        train_x = data[i*stride:i*stride + n_step*dim_input]
        train_x = train_x.reshape(n_step, dim_input)
        train_x_list.append(train_x)

        train_y = data[i*stride + n_step*dim_input:i*stride + n_step*dim_input + dim_output]
        train_y_list.append(train_y)

    train_data = np.array(train_x_list)
    train_label = np.array(train_y_list)

    test_data = data[10000:10000 + n_step*dim_input]
    test_data = test_data.reshape(1, n_step, dim_input)
    
    return train_data, train_label, test_data
InĀ [8]:
train_data, train_label, test_data = dataset(data, 5000)
InĀ [9]:
lstm_network.fit(train_data, train_label, epochs = 3)
Epoch 1/3
157/157 [==============================] - 10s 61ms/step - mse: 0.0311 - loss: 0.0311
Epoch 2/3
157/157 [==============================] - 9s 59ms/step - mse: 0.0071 - loss: 0.0071
Epoch 3/3
157/157 [==============================] - 9s 58ms/step - mse: 0.0053 - loss: 0.0053
Out[9]:
<tensorflow.python.keras.callbacks.History at 0x7f280050aa20>

3.4. Testing or EvaluatingĀ¶

  • Predict future time signal
InĀ [10]:
test_pred = lstm_network.predict(test_data).ravel()
test_label = data[10000:10000 + n_step*n_input + n_input]

plt.figure(figsize=(10,6))
plt.plot(np.arange(0, n_step*n_input + n_input), test_label, 'b', label = 'Ground truth')
plt.plot(np.arange(n_step*n_input, n_step*n_input + n_input), test_pred, 'r', label = 'Prediction')
plt.vlines(n_step*n_input, -1, 1, colors = 'r', linestyles = 'dashed')
plt.legend(fontsize = 15, loc = 'upper left')
plt.xlim(0, len(test_label))
plt.show()
InĀ [11]:
gen_signal = []
for i in range(n_step):
    test_pred = lstm_network.predict(test_data)
    gen_signal.append(test_pred.ravel())
    test_pred = test_pred[:, np.newaxis, :]
    
    test_data = test_data[:, 1:, :]
    test_data = np.concatenate([test_data, test_pred], axis = 1)
    
gen_signal = np.concatenate(gen_signal)

test_label = data[10000:10000 + n_step*n_input + n_step*n_input]

plt.figure(figsize=(10,6))
plt.plot(np.arange(0, n_step*n_input + n_step*n_input), test_label, 'b', label = 'Ground truth')
plt.plot(np.arange(n_step*n_input,  n_step*n_input + n_step*n_input), gen_signal, 'r', label = 'Prediction')
plt.vlines(n_step*n_input, -1, 1, colors = 'r', linestyles = 'dashed')
plt.legend(fontsize=15, loc = 'upper left')
plt.xlim(0, len(test_label))
plt.show()
InĀ [12]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')