Recurrent Neural Network


By Prof. Seungchul Lee
http://iai.postech.ac.kr/
Industrial AI Lab at POSTECH

Table of Contents

1. Recurrent Neural Network (RNN)

  • RNNs are a family of neural networks for processing sequential data

1.1. Feedforward Network and Sequential Data



  • Separate parameters for each value of the time index
  • Cannot share statistical strength across different time indices



1.2. Representation Shortcut

  • Input at each time is a vector
  • Each layer has many neurons
  • Output layer too may have many neurons
  • But will represent everything simple boxes
  • Each box actually represents an entire layer with many units



1.3. An Alternate Model for Infinite Response Systems

  • The state-space model
$$ \begin{align*} h_t &= f(x_t, h_{t-1})\\ y_t &= g(h_t) \end{align*} $$
  • This is a recurrent neural network
  • State summarizes information about the entire past
  • Single Hidden Layer RNN (Simplest State-Space Model)



  • Multiple Recurrent Layer RNN



  • Recurrent Neural Network
    • Simplified models often drawn
    • The loops imply recurrence



2. LSTM Networks

2.1. Long-Term Dependencies

  • Gradients propagated over many stages tend to either vanish or explode
  • Difficulty with long-term dependencies arises from the exponentially smaller weights given to long-term interactions
  • Introduce a memory state that runs through only linear operators
  • Use gating units to control the updates of the state



Example: "I grew up in France… I speak fluent French."



2.2. Long Short-Term Memory (LSTM)

  • Consists of a memory cell and a set of gating units
    • Memory cell is the context that carries over
    • Forget gate controls erase operation
    • Input gate controls write operation
    • Output gate controls the read operation




  • Connect LSTM cells in a recurrent manner
  • Train parameters in LSTM cells

2.2.1. LSTM for Classification



2.2.2. LSTM for Prediction



3. LSTM with TensorFlow

  • An example for predicting a next piece of an image
  • Regression problem
  • Again, MNIST dataset
  • Time series data and RNN



3.1. Import Library

In [1]:
!nvidia-smi
Tue Aug 17 11:00:55 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 457.51       Driver Version: 457.51       CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 105... WDDM  | 00000000:01:00.0  On |                  N/A |
| 35%   29C    P8    N/A /  75W |    589MiB /  4096MiB |      8%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2236    C+G   ...b3d8bbwe\WinStore.App.exe    N/A      |
|    0   N/A  N/A      6004    C+G   ...me\Application\chrome.exe    N/A      |
|    0   N/A  N/A      8392    C+G   ...cw5n1h2txyewy\LockApp.exe    N/A      |
|    0   N/A  N/A     11576    C+G   C:\Windows\explorer.exe         N/A      |
|    0   N/A  N/A     12828    C+G   ...5n1h2txyewy\SearchApp.exe    N/A      |
|    0   N/A  N/A     14488    C+G   ...ge\Application\msedge.exe    N/A      |
|    0   N/A  N/A     14508    C+G   ...ropbox\Client\Dropbox.exe    N/A      |
|    0   N/A  N/A     15524    C+G   Insufficient Permissions        N/A      |
|    0   N/A  N/A     15672    C+G   ...ion\2.10.123.42\whale.exe    N/A      |
|    0   N/A  N/A     15752    C+G   ...ekyb3d8bbwe\YourPhone.exe    N/A      |
|    0   N/A  N/A     17052    C+G   ...nputApp\TextInputHost.exe    N/A      |
|    0   N/A  N/A     17064    C+G   ...wekyb3d8bbwe\Video.UI.exe    N/A      |
+-----------------------------------------------------------------------------+
In [2]:
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
from six.moves import cPickle

3.2. Load MNIST Data

  • Import acceleration signal of rotation machinery

  • Download files

In [3]:
data =  cPickle.load(open('./data_files/rnn_time_signal.pkl', 'rb'))

plt.figure(figsize = (10,6))
plt.title('Time signal for RNN', fontsize=15)
plt.plot(data[0:2000])
plt.xlim(0,2000)
plt.show()

3.3. LSTM Model Training



In [4]:
n_step = 25
n_input = 100

# LSTM shape
n_lstm1 = 100
n_lstm2 = 100

# fully connected
n_hidden = 100
n_output = 100
In [5]:
lstm_network = tf.keras.models.Sequential([
    tf.keras.layers.Input(shape = (n_step, n_input)),
    tf.keras.layers.LSTM(n_lstm1, return_sequences = True),
    tf.keras.layers.LSTM(n_lstm2),
    tf.keras.layers.Dense(n_hidden),
    tf.keras.layers.Dense(n_output),
])

lstm_network.summary()
WARNING:tensorflow:Please add `keras.layers.InputLayer` instead of `keras.Input` to Sequential model. `keras.Input` is intended to be used by Functional model.
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm (LSTM)                  (None, 25, 100)           80400     
_________________________________________________________________
lstm_1 (LSTM)                (None, 100)               80400     
_________________________________________________________________
dense (Dense)                (None, 100)               10100     
_________________________________________________________________
dense_1 (Dense)              (None, 100)               10100     
=================================================================
Total params: 181,000
Trainable params: 181,000
Non-trainable params: 0
_________________________________________________________________
In [6]:
lstm_network.compile(optimizer = 'adam', 
                     loss = 'mean_squared_error', 
                     metrics = ['mse'])
In [7]:
def dataset(data, n_samples, n_step = n_step, dim_input = n_input, dim_output = n_output, stride = 5):
    
    train_x_list = []
    train_y_list = []
    for i in range(n_samples):
        train_x = data[i*stride:i*stride + n_step*dim_input]
        train_x = train_x.reshape(n_step, dim_input)
        train_x_list.append(train_x)

        train_y = data[i*stride + n_step*dim_input:i*stride + n_step*dim_input + dim_output]
        train_y_list.append(train_y)

    train_data = np.array(train_x_list)
    train_label = np.array(train_y_list)

    test_data = data[10000:10000 + n_step*dim_input]
    test_data = test_data.reshape(1, n_step, dim_input)
    
    return train_data, train_label, test_data
In [8]:
train_data, train_label, test_data = dataset(data, 5000)
In [9]:
lstm_network.fit(train_data, train_label, epochs = 3)
Epoch 1/3
157/157 [==============================] - 25s 75ms/step - loss: 0.0349 - mse: 0.0349
Epoch 2/3
157/157 [==============================] - 12s 79ms/step - loss: 0.0085 - mse: 0.0085
Epoch 3/3
157/157 [==============================] - 13s 84ms/step - loss: 0.0056 - mse: 0.0056
Out[9]:
<tensorflow.python.keras.callbacks.History at 0x23ac4bf9390>

3.4. Testing or Evaluating

  • Predict future time signal
In [10]:
test_pred = lstm_network.predict(test_data).reshape(-1)
test_label = data[10000:10000 + n_step*n_input + n_input]

plt.figure(figsize=(10,6))
plt.plot(np.arange(0, n_step*n_input + n_input), test_label, 'b', label = 'Ground truth')
plt.plot(np.arange(n_step*n_input, n_step*n_input + n_input), test_pred, 'r', label = 'Prediction')
plt.vlines(n_step*n_input, -1, 1, colors = 'r', linestyles = 'dashed')
plt.legend(fontsize = 15, loc = 'upper left')
plt.xlim(0, len(test_label))
plt.show()
In [11]:
gen_signal = []

for i in range(n_step):
    test_pred = lstm_network.predict(test_data)
    if i == 0:
        print(test_pred.shape)
    gen_signal.append(test_pred.reshape(-1))
    test_pred = test_pred[:, np.newaxis, :]
    if i == 0:
        print(test_pred.shape)
    test_data = test_data[:, 1:, :]
    test_data = np.concatenate([test_data, test_pred], axis = 1)
    
gen_signal = np.concatenate(gen_signal)

test_label = data[10000:10000 + n_step*n_input + n_step*n_input]

plt.figure(figsize=(10,6))
plt.plot(np.arange(0, n_step*n_input + n_step*n_input), test_label, 'b', label = 'Ground truth')
plt.plot(np.arange(n_step*n_input,  n_step*n_input + n_step*n_input), gen_signal, 'r', label = 'Prediction')
plt.vlines(n_step*n_input, -1, 1, colors = 'r', linestyles = 'dashed')
plt.legend(fontsize=15, loc = 'upper left')
plt.xlim(0, len(test_label))
plt.show()
(1, 100)
(1, 1, 100)

4. Video Lectures

In [12]:
%%html
<center><iframe src="https://www.youtube.com/embed/pxPu2IdnEiE?rel=0" 
width="420" height="315" frameborder="0" allowfullscreen></iframe></center>
In [13]:
%%html
<center><iframe src="https://www.youtube.com/embed/SwkS73r4J5c?rel=0" 
width="420" height="315" frameborder="0" allowfullscreen></iframe></center>
In [14]:
%%html
<center><iframe src="https://www.youtube.com/embed/DkWE9FonbO0?rel=0" 
width="420" height="315" frameborder="0" allowfullscreen></iframe></center>
In [15]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')