Time-Series Analysis

By Prof. Seungchul Lee
http://iailab.kaist.ac.kr/
Industrial AI Lab at KAIST

Table of Contents

1. So Far¶

Regression, Classification, Dimension Reduction,
Based on snapshot-type data

Sequence matters

What is a sequence?
- sentence
- medical signals
- speech waveform
- vibration measurement

Sequence Modeling
- Most of the real-world data is time-series
- There are important bits to be considered
  - Past events
  - Relationship between events
    - Causality
    - Credit assignment

Learning the structure and hierarchy

Use the past and present observations to predict the future

2. (Determinstic) Sequences and Difference Equations¶

We will focus on linear difference equations (LDE), a surprisingly rich topic both theoretically and practivally.

For example,

$$ y[0]=1,\quad y[1]=\frac{1}{2},\quad y[2]=\frac{1}{4},\quad \cdots $$

or by closed-form expression,

$$y[n]=\left(\frac{1}{2}\right)^n,\quad n≥0 $$

or with a difference equation and an initial condition,

$$y[n]=\frac{1}{2}y[n−1],\quad y[0]=1$$

High order homogeneous LDE

$$y[n]=\alpha_1 y[n−1] + \alpha_2 y [n−2] + \cdots + \alpha_k y[n-k]$$

3. (Stochastic) Time Series Analysis¶

3.1. Stationarity and Non-Stationary Series¶

A series is stationary if there is no systematic change in mean and variance over time
- Example: radio static

A series is non-stationary if mean and variance change over time
- Example: GDP, population, weather, etc.

3.2. Dealing with Non-Stationarity¶

Linear trends

Non-linear trends

For example, population may grow exponentially

Seasonal trends

Some series may exhibit seasonal trends
For example, weather pattern, employment, inflation, etc.

Combining Linear, Quadratic, and Seasonal Trends

Some data may have a combintation of trends

One solution is to apply repeated differencing to the series
For example, first remove seasonal trend. Then remove linear trend
Inspect model fit by examining residuals Q-Q plot

Anternatively, include both linear and cyclical trend terms into the model

\begin{align*} Y_t &= \beta_1 + \beta_2 Y_{t-1} \\ &+ \beta_3 t + \beta_4 t^{\beta_5} \\ &+ \beta_6 \sin \frac{2\pi}{s}t + \beta_7 \cos \frac{2\pi}{s}t \\ &+ u_t \end{align*}

4. Time-Series Data¶

(almost) all the data coming from manufacturing environment are time-series data

sensor data,
process times,
material measurement,
equipment maintenance history,
image data, etc.

Manufacturing application is about one of the following:

prediction of time-series values
anomaly detection on time-series data
classification of time-series values
metrology and inspection

4.1. Definition of time-series¶

$$x: T \rightarrow \mathbb{R}^n \;\; \text{where}\;\; T=\{\cdots, t_{-2},t_{-1},t_0,t_1,t_2, \cdots \}$$

Example: material measurements: when $n=3$

$$x(t) = \begin{bmatrix} \text{average thickness}(t)\\ \text{thickness variance}(t)\\ \text{resistivity}(t) \end{bmatrix} $$

4.2. Supervised and Unsupervised Learning for Time-series¶

For supervised learning, we define two time series

$$x: T \rightarrow \mathbb{R}^n \;\; \text{and} \;\; y: T \rightarrow \mathbb{R}^m$$

Supervised time-series learning:

$$ \begin{align*} \text{predict} \quad &y(t_k) \\ \text{given} \quad & x(t_k), x(t_{k-1}), \cdots \;\, \text{and} \;\, y(t_{k-1}), y(t_{k-2}), \cdots \end{align*} $$

Unsupervised time-series anomaly detection

Find time segment that is considerably differnt from the rest

$$ \begin{align*} \text{find} \quad & k^* \\ \text{such that} \quad & x(t_k) |_{k=k^*}^{k^*+s} \;\; \text{is significantly different from} \;\, x(t_k) |_{k=-\infty}^{\infty} \end{align*} $$

5. Markov Process¶

5.1. Sequential Processes¶

Most classifiers ignored the sequential aspects of data

Consider a system which can occupy one of $N$ discrete states or categories $q_t \in \{S_1,S_2,\cdots,S_N\}$

We are interested in stochastic systems, in which state evolution is random

Any joint distribution can be factored into a series of conditional distributions

$$p(q_0,q_1,\cdots,q_T ) = p(q_0) \; p(q_1 \mid q_0) \; p( q_2 \mid q_1 q_0 ) \; p( q_3 \mid q_2 q_1 q_0 ) \cdots$$

Amost impossible to compute !!

$$p(q_0,q_1,\cdots,q_T ) = p(q_0) \; p(q_1 \mid q_0) \; p( q_2 \mid q_1 ) \; p( q_3 \mid q_2 ) \cdots$$

Possible and tractable !!

5.2. Markov Process¶

(Assumption) for a Markov process, the next state depends only on the current state:

$$ p(q_{t+1} \mid q_t,\cdots,q_0) = p(q_{t+1} \mid q_t)$$

More clearly

$$ p(q_{t+1} = s_j \mid q_t = s_i) = p(q_{t+1} = s_j \mid q_t = s_i,\; \text{any earlier history})$$

Given current state, the past does not matter
The state captures all relevant information from the history
The state is a sufficient statistic of the future

5.3. State Transition Matrix¶

For a Markov state 𝑠 and successor state 𝑠′, the state transition probability is defined by

$$ P_{ss'} = P\left[S_{t+1} = s' \mid S_t = s \right] $$

State transition matrix $P$ defines transition probabilities from all states $s$ to all successor states $s'$.

Example: MC episodes

sample episodes starting from $S_1$

import numpy as np

P = [[0, 0, 1],
    [1/2, 1/2, 0],
    [1/3, 2/3, 0]]

print(P[1][:])

[0.5, 0.5, 0]

a = np.random.choice(3,1,p = P[1][:])
print(a)

[1]

# sequential processes
# sequence generated by Markov chain
# S1 = 0, S2 = 1, S3 = 2

# starting from 0
x = 0
S = []
S.append(x)

for i in range(50):
    x = np.random.choice(3,1,p = P[x][:])[0]
    S.append(x)

print(S)

[0, 2, 1, 1, 1, 0, 2, 0, 2, 0, 2, 1, 0, 2, 0, 2, 1, 1, 0, 2, 0, 2, 0, 2, 1, 0, 2, 1, 0, 2, 0, 2, 0, 2, 0, 2, 1, 1, 0, 2, 1, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1]

6. Hidden Markov Models¶

Discrete state-space model
- Used in speech recognition
- State representation is simple
- Hard to scale-up the training

Assumption
- We can observe something that is affected by the true state
- Natual way of thinking

Limited sensors (incomplete state information)
- But still partially related

Noisy senors
- Unreliable

True state (or hidden variable) follows Markov chain

Observation emitted from state
- $Y_t$ is noisily determined depending on the current state $X_t$

Forward: sequence of observations can be generated

Question: state estimation

$$P(X_T = s_i \mid Y_1 Y_2 \cdots Y_T)$$

HMM can do this, but with many difficulties

7. Kalman Filter¶

Linear dynamical system of motion

$$ \begin{align*} x_{t+1} &= A x_t + B u_t \\ z_t &= Cx_t \end{align*} $$

A, B, C ?

Continuous State space model
- For filtering and control applications
- Linear-Gaussian state space model
- Widely used in many applications:
  - GPS, weather systems, etc.

Weakness
- Linear state space model assumed
- Difficult to apply to highly non-linear domains

%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')