Recurrent Neural Network


By Prof. Seungchul Lee
http://iai.postech.ac.kr/
Industrial AI Lab at POSTECH

Table of Contents

1. Recurrent Neural Network (RNN)Ā¶

  • RNNs are a family of neural networks for processing sequential data

1.1. Feedforward Network and Sequential DataĀ¶



  • Separate parameters for each value of the time index
  • Cannot share statistical strength across different time indices



1.2. Representation ShortcutĀ¶

  • Input at each time is a vector
  • Each layer has many neurons
  • Output layer too may have many neurons
  • But will represent everything simple boxes
  • Each box actually represents an entire layer with many units



1.3. An Alternate Model for Infinite Response SystemsĀ¶

  • The state-space model
$$ \begin{align*} h_t &= f(x_t, h_{t-1})\\ y_t &= g(h_t) \end{align*} $$
  • This is a recurrent neural network
  • State summarizes information about the entire past
  • Single Hidden Layer RNN (Simplest State-Space Model)



  • Multiple Recurrent Layer RNN



  • Recurrent Neural Network
    • Simplified models often drawn
    • The loops imply recurrence



2. LSTM NetworksĀ¶

2.1. Long-Term DependenciesĀ¶

  • Gradients propagated over many stages tend to either vanish or explode
  • Difficulty with long-term dependencies arises from the exponentially smaller weights given to long-term interactions
  • Introduce a memory state that runs through only linear operators
  • Use gating units to control the updates of the state



Example: "I grew up in Franceā€¦ I speak fluent French."



2.2. Long Short-Term Memory (LSTM)Ā¶

  • Consists of a memory cell and a set of gating units
    • Memory cell is the context that carries over
    • Forget gate controls erase operation
    • Input gate controls write operation
    • Output gate controls the read operation




  • Connect LSTM cells in a recurrent manner
  • Train parameters in LSTM cells

2.2.1. LSTM for ClassificationĀ¶



2.2.2. LSTM for PredictionĀ¶



3. LSTM with TensorFlowĀ¶

  • An example for predicting a next piece of an image
  • Regression problem
  • Again, MNIST dataset
  • Time series data and RNN



3.1. Import LibraryĀ¶

InĀ [1]:
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
from six.moves import cPickle

3.2. Load MNIST DataĀ¶

  • Import acceleration signal of rotation machinery

  • Download files

InĀ [2]:
data =  cPickle.load(open('/content/rnn_time_signal.pkl', 'rb'))
print("Shape of data is {}".format(data.shape))
plt.figure(figsize = (10,6))
plt.title('Time signal for RNN', fontsize=15)
plt.plot(data[0:2000])
plt.xlim(0,2000)
plt.show()
Shape of data is (82000,)
InĀ [3]:
def dataset(data, n_samples, n_step = 25, dim_input = 100, dim_output = 100, stride = 5):
    
    train_x_list = []
    train_y_list = []
    for i in range(n_samples):
        '''
        train_data_shape =  (25,100)
        '''
        train_x = data[i*stride:i*stride + n_step*dim_input] #length = 2500
        train_x = train_x.reshape(n_step, dim_input) #(25,100)
        train_x_list.append(train_x)
        '''
        train_label_shape = 100
        '''
        train_y = data[i*stride + n_step*dim_input:i*stride + n_step*dim_input + dim_output]
        train_y_list.append(train_y)
    '''
    Therefore, the purpose of RNN is to predict 100 sequences given 2500 sequences
    '''
    train_data = np.array(train_x_list)
    train_label = np.array(train_y_list)

    test_data = data[10000:10000 + n_step*dim_input]
    test_data = test_data.reshape(1, n_step, dim_input)
    
    return train_data, train_label, test_data
train_data, train_label, test_data = dataset(data, 5000)
InĀ [Ā ]:
train_data.shape
Out[Ā ]:
(5000, 25, 100)
InĀ [Ā ]:
train_label.shape
Out[Ā ]:
(5000, 100)

3.3. LSTM Model TrainingĀ¶



InĀ [4]:
n_step = 25
n_input = 100

# LSTM shape
n_lstm1 = 100
n_lstm2 = 100

# Fully connected
n_hidden = 100
n_output = 100
InĀ [5]:
lstm_network = tf.keras.models.Sequential([
    tf.keras.layers.Input(shape = (n_step, n_input)),
    tf.keras.layers.LSTM(n_lstm1, return_sequences = True),
    tf.keras.layers.LSTM(n_lstm2),
    tf.keras.layers.Dense(n_hidden),
    tf.keras.layers.Dense(n_output),
])

lstm_network.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 lstm (LSTM)                 (None, 25, 100)           80400     
                                                                 
 lstm_1 (LSTM)               (None, 100)               80400     
                                                                 
 dense (Dense)               (None, 100)               10100     
                                                                 
 dense_1 (Dense)             (None, 100)               10100     
                                                                 
=================================================================
Total params: 181,000
Trainable params: 181,000
Non-trainable params: 0
_________________________________________________________________
InĀ [6]:
lstm_network.compile(optimizer = 'adam', 
                     loss = 'mean_squared_error', 
                     metrics = ['mse'])
InĀ [7]:
lstm_network.fit(train_data, train_label, epochs = 10)
Epoch 1/10
157/157 [==============================] - 10s 40ms/step - loss: 0.0352 - mse: 0.0352
Epoch 2/10
157/157 [==============================] - 6s 40ms/step - loss: 0.0064 - mse: 0.0064
Epoch 3/10
157/157 [==============================] - 7s 45ms/step - loss: 0.0050 - mse: 0.0050
Epoch 4/10
157/157 [==============================] - 6s 40ms/step - loss: 0.0047 - mse: 0.0047
Epoch 5/10
157/157 [==============================] - 6s 39ms/step - loss: 0.0045 - mse: 0.0045
Epoch 6/10
157/157 [==============================] - 6s 40ms/step - loss: 0.0043 - mse: 0.0043
Epoch 7/10
157/157 [==============================] - 6s 39ms/step - loss: 0.0042 - mse: 0.0042
Epoch 8/10
157/157 [==============================] - 6s 41ms/step - loss: 0.0042 - mse: 0.0042
Epoch 9/10
157/157 [==============================] - 6s 40ms/step - loss: 0.0041 - mse: 0.0041
Epoch 10/10
157/157 [==============================] - 6s 41ms/step - loss: 0.0040 - mse: 0.0040
Out[7]:
<keras.callbacks.History at 0x7f2214ec8fd0>
InĀ [8]:
rnn_network = tf.keras.models.Sequential([
    tf.keras.layers.Input(shape = (n_step, n_input)),
    tf.keras.layers.SimpleRNN(n_lstm1, return_sequences = True),
    tf.keras.layers.SimpleRNN(n_lstm2),
    tf.keras.layers.Dense(n_hidden),
    tf.keras.layers.Dense(n_output),
])

rnn_network.summary()
Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 simple_rnn (SimpleRNN)      (None, 25, 100)           20100     
                                                                 
 simple_rnn_1 (SimpleRNN)    (None, 100)               20100     
                                                                 
 dense_2 (Dense)             (None, 100)               10100     
                                                                 
 dense_3 (Dense)             (None, 100)               10100     
                                                                 
=================================================================
Total params: 60,400
Trainable params: 60,400
Non-trainable params: 0
_________________________________________________________________
InĀ [9]:
rnn_network.compile(optimizer = 'adam', 
                     loss = 'mean_squared_error', 
                     metrics = ['mse'])
InĀ [10]:
rnn_network.fit(train_data, train_label, epochs = 10)
Epoch 1/10
157/157 [==============================] - 6s 17ms/step - loss: 0.0226 - mse: 0.0226
Epoch 2/10
157/157 [==============================] - 3s 17ms/step - loss: 0.0065 - mse: 0.0065
Epoch 3/10
157/157 [==============================] - 3s 22ms/step - loss: 0.0056 - mse: 0.0056
Epoch 4/10
157/157 [==============================] - 3s 17ms/step - loss: 0.0054 - mse: 0.0054
Epoch 5/10
157/157 [==============================] - 3s 17ms/step - loss: 0.0051 - mse: 0.0051
Epoch 6/10
157/157 [==============================] - 3s 16ms/step - loss: 0.0050 - mse: 0.0050
Epoch 7/10
157/157 [==============================] - 3s 16ms/step - loss: 0.0047 - mse: 0.0047
Epoch 8/10
157/157 [==============================] - 3s 16ms/step - loss: 0.0047 - mse: 0.0047
Epoch 9/10
157/157 [==============================] - 3s 18ms/step - loss: 0.0046 - mse: 0.0046
Epoch 10/10
157/157 [==============================] - 4s 22ms/step - loss: 0.0045 - mse: 0.0045
Out[10]:
<keras.callbacks.History at 0x7f2214e55d10>

3.4. Testing or EvaluatingĀ¶

  • Predict future time signal
InĀ [11]:
test_pred = lstm_network.predict(test_data).ravel()
test_label = data[10000:10000 + n_step*n_input + n_input]

plt.figure(figsize=(10,6))
plt.plot(np.arange(0, n_step*n_input + n_input), test_label, 'b', label = 'Ground truth')
plt.plot(np.arange(n_step*n_input, n_step*n_input + n_input), test_pred, 'r', label = 'Prediction')
plt.vlines(n_step*n_input, -1, 1, colors = 'r', linestyles = 'dashed')
plt.legend(fontsize = 15, loc = 'upper left')
plt.xlim(0, len(test_label))
plt.show()
InĀ [12]:
gen_signal = []
for i in range(n_step):
    test_pred = lstm_network.predict(test_data)
    gen_signal.append(test_pred.ravel())
    test_pred = test_pred[:, np.newaxis, :]
    
    test_data = test_data[:, 1:, :]
    test_data = np.concatenate([test_data, test_pred], axis = 1)
    
gen_signal = np.concatenate(gen_signal)

test_label = data[10000:10000 + n_step*n_input + n_step*n_input]

plt.figure(figsize=(10,6))
plt.plot(np.arange(0, n_step*n_input + n_step*n_input), test_label, 'b', label = 'Ground truth')
plt.plot(np.arange(n_step*n_input,  n_step*n_input + n_step*n_input), gen_signal, 'r', label = 'Prediction')
plt.vlines(n_step*n_input, -1, 1, colors = 'r', linestyles = 'dashed')
plt.legend(fontsize=15, loc = 'upper left')
plt.xlim(0, len(test_label))
plt.show()
InĀ [13]:
test_pred = rnn_network.predict(test_data).ravel()
test_label = data[10000:10000 + n_step*n_input + n_input]

plt.figure(figsize=(10,6))
plt.plot(np.arange(0, n_step*n_input + n_input), test_label, 'b', label = 'Ground truth')
plt.plot(np.arange(n_step*n_input, n_step*n_input + n_input), test_pred, 'r', label = 'Prediction')
plt.vlines(n_step*n_input, -1, 1, colors = 'r', linestyles = 'dashed')
plt.legend(fontsize = 15, loc = 'upper left')
plt.xlim(0, len(test_label))
plt.show()
InĀ [14]:
gen_signal = []
for i in range(n_step):
    test_pred = rnn_network.predict(test_data)
    gen_signal.append(test_pred.ravel())
    test_pred = test_pred[:, np.newaxis, :]
    
    test_data = test_data[:, 1:, :]
    test_data = np.concatenate([test_data, test_pred], axis = 1)
    
gen_signal = np.concatenate(gen_signal)

test_label = data[10000:10000 + n_step*n_input + n_step*n_input]

plt.figure(figsize=(10,6))
plt.plot(np.arange(0, n_step*n_input + n_step*n_input), test_label, 'b', label = 'Ground truth')
plt.plot(np.arange(n_step*n_input,  n_step*n_input + n_step*n_input), gen_signal, 'r', label = 'Prediction')
plt.vlines(n_step*n_input, -1, 1, colors = 'r', linestyles = 'dashed')
plt.legend(fontsize=15, loc = 'upper left')
plt.xlim(0, len(test_label))
plt.show()
InĀ [1]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')