RNN and LSTM
Table of Contents
Regression, Classification, Dimension Reduction,
Based on snapshot-type data
What is a sequence?
Sequence Modeling
Learning the structure and hierarchy
Use the past and present observations to predict the future
We will focus on linear difference equations (LDE), a surprisingly rich topic both theoretically and practivally.
For example,
or by closed-form expression,
or with a difference equation and an initial condition,
High order homogeneous LDE
Non-linear trends
Some series may exhibit seasonal trends
For example, weather pattern, employment, inflation, etc.
Combining Linear, Quadratic, and Seasonal Trends
One solution is to apply repeated differencing to the series
For example, first remove seasonal trend. Then remove linear trend
Inspect model fit by examining residuals Q-Q plot
Anternatively, include both linear and cyclical trend terms into the model
(almost) all the data coming from manufacturing environment are time-series data
Manufacturing application is about one of the following:
Definition of time-series
Example: material measurements: when $n=3$
Supervised and Unsupervised Learning for Time-series
For supervised learning, we define two time series
Supervised time-series learning:
Unsupervised time-series anomaly detection
Most classifiers ignored the sequential aspects of data
Consider a system which can occupy one of $N$ discrete states or categories
$q_t \in \{S_1,S_2,\cdots,S_N\}$
We are interested in stochastic systems, in which state evolution is random
Any joint distribution can be factored into a series of conditional distributions
For a Markov state 𝑠 and successor state 𝑠′, the state transition probability is defined by
State transition matrix $P$ defines transition probabilities from all states $s$ to all successor states $s'$.
Example: MC episodes
import numpy as np
P = [[0, 0, 1],
[1/2, 1/2, 0],
[1/3, 2/3, 0]]
print(P[1][:])
a = np.random.choice(3,1,p = P[1][:])
print(a)
# sequential processes
# sequence generated by Markov chain
# S1 = 0, S2 = 1, S3 = 2
# starting from 0
x = 0
S = []
S.append(x)
for i in range(50):
x = np.random.choice(3,1,p = P[x][:])[0]
S.append(x)
print(S)
Discrete state-space model
Assumption
Limited sensors (incomplete state information)
Noisy senors
True state (or hidden variable) follows Markov chain
Observation emitted from state
Forward: sequence of observations can be generated
Question: state estimation
A, B, C ?
Continuous State space model
Weakness
Separate parameters for each value of the time index
Cannot share statistical strength across different time indices
This is a recurrent neural network
State summarizes information about the entire past
Single Hidden Layer RNN (Simplest State-Space Model)
Example: "I grew up in France… I speak fluent French."
LSTM for Classification
LSTM for Prediction
An example for predicting a next piece of an image
Regression problem
Again, MNIST dataset
Time series data and RNN
Import Library
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
from six.moves import cPickle
Load MNIST Data
Import acceleration signal of rotation machinery
from google.colab import drive
drive.mount('/content/drive')
data = cPickle.load(open('/content/drive/MyDrive/DL/DL_data/rnn_time_signal.pkl', 'rb'))
plt.figure(figsize = (8, 4))
plt.title('Time signal for RNN')
plt.plot(data[0:2000])
plt.xlim(0,2000)
plt.show()
LSTM Model Training
n_step = 25
n_input = 100
# LSTM shape
n_lstm1 = 100
n_lstm2 = 100
# fully connected
n_hidden = 100
n_output = 100
lstm_network = tf.keras.models.Sequential([
tf.keras.layers.Input(shape = (n_step, n_input)),
tf.keras.layers.LSTM(n_lstm1, return_sequences = True),
tf.keras.layers.LSTM(n_lstm2),
tf.keras.layers.Dense(n_hidden),
tf.keras.layers.Dense(n_output),
])
lstm_network.summary()
lstm_network.compile(optimizer = 'adam',
loss = 'mean_squared_error',
metrics = ['mse'])
def dataset(data, n_samples, n_step = n_step, dim_input = n_input, dim_output = n_output, stride = 5):
train_x_list = []
train_y_list = []
for i in range(n_samples):
train_x = data[i*stride:i*stride + n_step*dim_input]
train_x = train_x.reshape(n_step, dim_input)
train_x_list.append(train_x)
train_y = data[i*stride + n_step*dim_input:i*stride + n_step*dim_input + dim_output]
train_y_list.append(train_y)
train_data = np.array(train_x_list)
train_label = np.array(train_y_list)
test_data = data[10000:10000 + n_step*dim_input]
test_data = test_data.reshape(1, n_step, dim_input)
return train_data, train_label, test_data
train_data, train_label, test_data = dataset(data, 5000)
lstm_network.fit(train_data, train_label, epochs = 3)
Testing or Evaluating
test_pred = lstm_network.predict(test_data).ravel()
test_label = data[10000:10000 + n_step*n_input + n_input]
plt.figure(figsize = (8, 4))
plt.plot(np.arange(0, n_step*n_input + n_input), test_label, 'b', label = 'Ground truth')
plt.plot(np.arange(n_step*n_input, n_step*n_input + n_input), test_pred, 'r', label = 'Prediction')
plt.vlines(n_step*n_input, -1, 1, colors = 'r', linestyles = 'dashed')
plt.legend(fontsize = 15, loc = 'upper left')
plt.xlim(0, len(test_label))
plt.show()
gen_signal = []
for i in range(n_step):
test_pred = lstm_network.predict(test_data, verbose = 0)
gen_signal.append(test_pred.ravel())
test_pred = test_pred[:, np.newaxis, :]
test_data = test_data[:, 1:, :]
test_data = np.concatenate([test_data, test_pred], axis = 1)
gen_signal = np.concatenate(gen_signal)
test_label = data[10000:10000 + n_step*n_input + n_step*n_input]
plt.figure(figsize = (8, 4))
plt.plot(np.arange(0, n_step*n_input + n_step*n_input), test_label, 'b', label = 'Ground truth')
plt.plot(np.arange(n_step*n_input, n_step*n_input + n_step*n_input), gen_signal, 'r', label = 'Prediction')
plt.vlines(n_step*n_input, -1, 1, colors = 'r', linestyles = 'dashed')
plt.legend(fontsize=15, loc = 'upper left')
plt.xlim(0, len(test_label))
plt.show()
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')