**Recurrent Neural Networks (RNN)**

Table of Contents

- Sequence matters

What is a sequence?

- sentence
- medical signals
- speech waveform
- vibration measurement

Sequence Modeling

- Most of the real-world data is time-series
- There are important bits to be considered
- Past events
- Relationship between events
- Causality
- Credit assignment

Learning the structure and hierarchy

Use the past and present observations to predict the future

## 1.2. (Determinstic) Sequences and Difference EquationsÂ¶

We will focus on linear difference equations (LDE), a surprisingly rich topic both theoretically and practivally.

For example,

$$ y[0]=1,\quad y[1]=\frac{1}{2},\quad y[2]=\frac{1}{4},\quad \cdots $$

or by closed-form expression,

$$y[n]=\left(\frac{1}{2}\right)^n,\quad nâ‰¥0 $$

or with a difference equation and an initial condition,

$$y[n]=\frac{1}{2}y[nâˆ’1],\quad y[0]=1$$

High order homogeneous LDE

$$y[n]=\alpha_1 y[nâˆ’1] + \alpha_2 y [nâˆ’2] + \cdots + \alpha_k y[n-k]$$

## 1.3. (Stochastic) Time Series AnalysisÂ¶

### 1.3.1. Stationarity and Non-Stationary SeriesÂ¶

A series is

*stationary*if there is no systematic change in mean and variance over time- Example: radio static

A series is

*non-stationary*if mean and variance change over time- Example: GDP, population, weather, etc.

**Non-linear trends**

- For example, population may grow exponentially

**Seasonal trends**

Some series may exhibit seasonal trends

For example, weather pattern, employment, inflation, etc.

**Combining Linear, Quadratic, and Seasonal Trends**

- Some data may have a combintation of trends

One solution is to apply repeated differencing to the series

For example, first remove seasonal trend. Then remove linear trend

Inspect model fit by examining residuals Q-Q plot

Anternatively, include both linear and cyclical trend terms into the model

\begin{align*} Y_t &= \beta_1 + \beta_2 Y_{t-1} \\ &+ \beta_3 t + \beta_4 t^{\beta_5} \\ &+ \beta_6 \sin \frac{2\pi}{s}t + \beta_7 \cos \frac{2\pi}{s}t \\ &+ u_t \end{align*}

## 1.4. Time-Series DataÂ¶

(almost) all the data coming from manufacturing environment are time-series data

- sensor data,
- process times,
- material measurement,
- equipment maintenance history,
- image data, etc.

Manufacturing application is about one of the following:

- prediction of time-series values
- anomaly detection on time-series data
- classification of time-series values
- metrology and inspection

### 1.4.1. Definition of time-seriesÂ¶

$$x: T \rightarrow \mathbb{R}^n \;\; \text{where}\;\; T=\{\cdots, t_{-2},t_{-1},t_0,t_1,t_2, \cdots \}$$

Example: material measurements: when $n=3$

$$x(t) = \begin{bmatrix} \text{average thickness}(t)\\ \text{thickness variance}(t)\\ \text{resistivity}(t) \end{bmatrix} $$

### 1.4.2. Supervised and Unsupervised Learning for Time-seriesÂ¶

For supervised learning, we define two time series

$$x: T \rightarrow \mathbb{R}^n \;\; \text{and} \;\; y: T \rightarrow \mathbb{R}^m$$

Supervised time-series learning:

$$ \begin{align*} \text{predict} \quad &y(t_k) \\ \text{given} \quad & x(t_k), x(t_{k-1}), \cdots \;\, \text{and} \;\, y(t_{k-1}), y(t_{k-2}), \cdots \end{align*} $$

Unsupervised time-series anomaly detection

- Find time segment that is considerably differnt from the rest

$$ \begin{align*} \text{find} \quad & k^* \\ \text{such that} \quad & x(t_k) |_{k=k^*}^{k^*+s} \;\; \text{is significantly different from} \;\, x(t_k) |_{k=-\infty}^{\infty} \end{align*} $$

## 1.5. Markov ProcessÂ¶

### 1.5.1. Sequential ProcessesÂ¶

Most classifiers ignored the sequential aspects of data

Consider a system which can occupy one of $N$ discrete states or categories $q_t \in \{S_1,S_2,\cdots,S_N\}$

We are interested in stochastic systems, in which state evolution is random

Any joint distribution can be factored into a series of conditional distributions

$$p(q_0,q_1,\cdots,q_T ) = p(q_0) \; p(q_1 \mid q_0) \; p( q_2 \mid q_1 q_0 ) \; p( q_3 \mid q_2 q_1 q_0 ) \cdots$$

$$p(q_0,q_1,\cdots,q_T ) = p(q_0) \; p(q_1 \mid q_0) \; p( q_2 \mid q_1 ) \; p( q_3 \mid q_2 ) \cdots$$

### 1.5.2. Markov ProcessÂ¶

- (Assumption) for a Markov process, the next state depends only on the current state:

$$ p(q_{t+1} \mid q_t,\cdots,q_0) = p(q_{t+1} \mid q_t)$$

- More clearly

$$ p(q_{t+1} = s_j \mid q_t = s_i) = p(q_{t+1} = s_j \mid q_t = s_i,\; \text{any earlier history})$$

- Given current state, the past does not matter
- The state captures all relevant information from the history
- The state is a sufficient statistic of the future

### 1.5.3. State Transition MatrixÂ¶

For a Markov state $s$ and successor state $s'$, the state transition probability is defined by

$$ P_{ss'} = P \left[S_{t+1} = s' \mid S_t = s \right] $$

State transition matrix $P$ defines transition probabilities from all states $s$ to all successor states $s'$.