Anomaly Detection

By Prof. Seungchul Lee
http://iailab.kaist.ac.kr/
Industrial AI Lab at KAIST

Table of Contents

1. Anomaly¶

Anomalies and outliers are essentially the same thing
- Objects that are different from most other objects
- Something that deviates from what is standard, or expected (one classification)

Causes of Anomalies

Data from different class of object or underlying mechanism
- disease vs. non-disease
- fraud vs. not fraud
Data measurement and collection errors
Natural variation
- tails on a Gaussian distribution

$$P_Y(Y=y) = \frac{1}{\sqrt{2\pi}} \exp \left(-\frac{1}{2} y^2 \right)$$

Anomaly Detection

Finding outliers

Applications of Anomaly Detection

Security & Surveillance

Biomedical Applications

Industrial Damage Detection

Machinery Defects Diagnostics
- Diagnosis of machinery conditions
- Early alarm of malfunctioning

Difficulties with Anomaly Detection

Scarcity of Anomalies
- It is not easy to get anomaly data, because anomaly rarely happens
- Overfitting issue occurs when there is only small number of data
Diverse Types of Anomalies
- There are so many causes of anomalies
- At the training stage of neural network, we cannot have all possible anomalies as input data

Use of Data Labels in Anomaly Detection

Supervised Anomaly Detection
- Labels available for both normal data and anomalies
- Similar to classification with high class imbalance
Semi-supervised Anomaly Detection
- Labels available only for normal data
Unsupervised Anomaly Detection
- No labels assumed
- Based on the assumption that anomalies are very rare compared to normal data

Output of Anomaly Detection

Label
- Each test instance is given a normal or anomaly label
- Same as the typical output of classification-based approaches
Score
- Each test instance is assigned an anomaly score
- Allows outputs to be ranked in the order of anomaly scores
- Requires an additional threshold parameter

Variants of Anomaly Detection Problem

Given a dataset $D$, find all the data points $x \in D$ with anomaly scores greater than some threshold $t$
Given a dataset $D$, find all the data points $x \in D$ having the top-n largest anomaly scores
Given a dataset $D$, containing mostly normal data points, and a test point $x$, compute the anomaly score of $x$ with respect to $D$

2. Statistical Anomaly Detection¶

Anomalies (outliers) are objects that are fit poorly by a statistical model
Estimate a parametric model describing the distribution of the data
Apply a statistical test that depends on
- Properties of test instance
- Parameters of model (e.g., mean, variance)
- Confidence limit (related to number of expected outliers)

Univariate Gaussian Distribution

Outlier defined by Z-score $>$ threshold

$$z_i = \frac{x_i-\bar x}{s}$$

$\quad \;$ where $\bar x$ is a sample mean and $s$ is a sample variance.

Multivariate Gaussian Distribution

Outlier defined by Mahalanobis distance $>$ threshold

Pros and Cons

Pros
- Statistical tests are well-understood and well-validated
- Quantitative measure of degree to which object is an outlier
Cons
- Data may be hard to model parametrically
- multiple modes
- variable density
- In high dimensions, data may be insufficient to estimate true distribution

3. Deep Learning-based Anomaly Detection¶

Train autoencoders only with normal data
- Trained autoencoders will only capture features of normal data
Test with (normal + anomaly) data

Convolutional Autoencoder (CAE)

CAE is trained to compress/decompress normal images to/from the latent space
CAE compresses the input image into few latent variables
CAE decompresses (reconstruct) the image that includes some error compared to the original
For anomalous data, the reconstruction error (anomaly score) would be greater than that of normal images

Training using normal data
Anomaly detection with test data
- Use the CAE that was trained with normal data

Anomaly Score

Reconstruction Error
Root mean squared error (RMSE)

4. Anomaly Detection with CAE in TensorFlow¶

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import random

Load MNIST Data

(train_imgs, train_labels), (test_imgs, test_labels) = tf.keras.datasets.mnist.load_data()
train_imgs, test_imgs = train_imgs/255.0, test_imgs/255.0

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11490434/11490434 [==============================] - 2s 0us/step

print('shape of x_train:', train_imgs.shape)
print('shape of y_train:', train_labels.shape)
print('shape of x_test:', test_imgs.shape)
print('shape of y_test:', test_labels.shape)

shape of x_train: (60000, 28, 28)
shape of y_train: (60000,)
shape of x_test: (10000, 28, 28)
shape of y_test: (10000,)

Seperate Normal and Abnormal Data

normal_train_index = np.hstack([np.where(train_labels == 7)])[0]
normal_test_index = np.hstack([np.where(test_labels == 7)])[0]
abnormal_test_index = np.hstack([np.where(test_labels == 5)])[0]

normal_train_x = train_imgs[normal_train_index].reshape(-1,28,28,1)
normal_train_y = train_labels[normal_train_index]

normal_test_x = test_imgs[normal_test_index].reshape(-1,28,28,1)
normal_test_y = test_labels[normal_test_index]

abnormal_test_x = test_imgs[abnormal_test_index].reshape(-1,28,28,1)
abnormal_test_y = test_labels[abnormal_test_index]

print('shape of normal_train_x:', normal_train_x.shape)
print('shape of normal_test_x:', normal_test_x.shape)
print('shape of abnormal_test_x:', abnormal_test_x.shape)

shape of normal_train_x: (6265, 28, 28, 1)
shape of normal_test_x: (1028, 28, 28, 1)
shape of abnormal_test_x: (892, 28, 28, 1)

Plot Normal and Abnormal Data

random.seed(6)
idx = random.sample(range(normal_train_x.shape[0]), 4)

plt.figure(figsize = (8, 3))

for i in range(4):
    plt.subplot(1,4,i+1)
    plt.imshow(normal_train_x[idx[i]], 'gray')
    plt.title('Normal')
    plt.axis('off')

plt.tight_layout()
plt.show()

random.seed(11)
idx = random.sample(range(abnormal_test_x.shape[0]), 4)

plt.figure(figsize = (8, 3))

for i in range(4):
    plt.subplot(1,4,i+1)
    plt.imshow(abnormal_test_x[idx[i]], 'gray')
    plt.title('Abnormal')
    plt.axis('off')

plt.tight_layout()
plt.show()

Build a Model

# Encoder

encoder = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(filters = 32,
                           kernel_size = (3, 3),
                           strides = (2, 2),
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (28, 28, 1)),

    tf.keras.layers.Conv2D(filters = 64,
                           kernel_size = (3, 3),
                           strides = (2, 2),
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (14, 14, 32)),

    tf.keras.layers.Conv2D(filters = 2,
                           kernel_size = (7, 7),
                           padding = 'VALID',
                           input_shape = (7, 7, 64))
])

encoder.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d (Conv2D)             (None, 14, 14, 32)        320       
                                                                 
 conv2d_1 (Conv2D)           (None, 7, 7, 64)          18496     
                                                                 
 conv2d_2 (Conv2D)           (None, 1, 1, 2)           6274      
                                                                 
=================================================================
Total params: 25090 (98.01 KB)
Trainable params: 25090 (98.01 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

# Decoder

decoder = tf.keras.models.Sequential([
    tf.keras.layers.Conv2DTranspose(filters = 64,
                                    kernel_size = (7, 7),
                                    strides = (1, 1),
                                    activation = 'relu',
                                    padding = 'VALID',
                                    input_shape = (1, 1, 2)),

    tf.keras.layers.Conv2DTranspose(filters = 32,
                                    kernel_size = (3, 3),
                                    strides = (2, 2),
                                    activation = 'relu',
                                    padding = 'SAME',
                                    input_shape = (7, 7, 64)),

    tf.keras.layers.Conv2DTranspose(filters = 1,
                                    kernel_size = (7, 7),
                                    strides = (2, 2),
                                    padding = 'SAME',
                                    input_shape = (14,14,32))
])

decoder.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d_transpose (Conv2DTr  (None, 7, 7, 64)          6336      
 anspose)                                                        
                                                                 
 conv2d_transpose_1 (Conv2D  (None, 14, 14, 32)        18464     
 Transpose)                                                      
                                                                 
 conv2d_transpose_2 (Conv2D  (None, 28, 28, 1)         1569      
 Transpose)                                                      
                                                                 
=================================================================
Total params: 26369 (103.00 KB)
Trainable params: 26369 (103.00 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

latent = encoder.output
result = decoder(latent)

cae_model = tf.keras.Model(inputs = encoder.input, outputs = result)

cae_model.compile(optimizer = 'adam',
                  loss = 'mean_squared_error')

cae_model.fit(normal_train_x, normal_train_x, epochs = 10)

Epoch 1/10
196/196 [==============================] - 12s 6ms/step - loss: 0.0441
Epoch 2/10
196/196 [==============================] - 1s 7ms/step - loss: 0.0327
Epoch 3/10
196/196 [==============================] - 1s 5ms/step - loss: 0.0322
Epoch 4/10
196/196 [==============================] - 1s 5ms/step - loss: 0.0318
Epoch 5/10
196/196 [==============================] - 1s 4ms/step - loss: 0.0307
Epoch 6/10
196/196 [==============================] - 1s 5ms/step - loss: 0.0292
Epoch 7/10
196/196 [==============================] - 1s 5ms/step - loss: 0.0283
Epoch 8/10
196/196 [==============================] - 1s 5ms/step - loss: 0.0279
Epoch 9/10
196/196 [==============================] - 1s 4ms/step - loss: 0.0275
Epoch 10/10
196/196 [==============================] - 1s 4ms/step - loss: 0.0273

<keras.src.callbacks.History at 0x7808eaac5bd0>

Look at Latent Space

random.seed(2)
idx_n = np.random.choice(normal_test_x.shape[0], 1000)
idx_a = np.random.choice(abnormal_test_x.shape[0], 50)

test_x_n, test_y_n = normal_test_x[idx_n], normal_test_y[idx_n]
test_x_a, test_y_a = abnormal_test_x[idx_a], abnormal_test_y[idx_a]

normal_latent = encoder.predict(test_x_n)
normal_latent = normal_latent.reshape(-1,2)
abnormal_latent = encoder.predict(test_x_a)
abnormal_latent = abnormal_latent.reshape(-1,2)

32/32 [==============================] - 0s 3ms/step
2/2 [==============================] - 0s 63ms/step

plt.figure(figsize = (6, 6))
plt.scatter(normal_latent[test_y_n == 7, 0], normal_latent[test_y_n == 7, 1], label = 'Normal 7')

plt.scatter(abnormal_latent[:, 0], abnormal_latent[:, 1], label = 'Abnormal')
plt.title('Latent Space', fontsize = 15)
plt.xlabel('Z1', fontsize = 15)
plt.ylabel('Z2', fontsize = 15)
plt.legend(fontsize = 15)
plt.show()

Test

# Normal
normal_input = normal_test_x[0].reshape(-1,28,28,1)
normal_recon = cae_model.predict(normal_input)
n_recon_err = cae_model.evaluate(normal_input, normal_input)

1/1 [==============================] - 0s 190ms/step
1/1 [==============================] - 0s 162ms/step - loss: 0.0174

plt.figure(figsize = (8, 4))
plt.subplot(1,2,1)
plt.imshow(normal_input[0], 'gray')
plt.title('Input image')
plt.axis('off')
plt.subplot(1,2,2)
plt.imshow(normal_recon[0], 'gray')
plt.title('Reconstructed image')
plt.axis('off')
plt.show()
print('Reconstruciton error: ', n_recon_err)

Reconstruciton error:  0.01736810803413391

# Abnormal
abnormal_input = abnormal_test_x[0].reshape(-1,28,28,1)
abnormal_recon = cae_model.predict(abnormal_input)
ab_recon_err = cae_model.evaluate(abnormal_input, abnormal_input)

1/1 [==============================] - 0s 18ms/step
1/1 [==============================] - 0s 23ms/step - loss: 0.1113

plt.figure(figsize = (8, 4))
plt.subplot(1,2,1)
plt.imshow(abnormal_input[0], 'gray')
plt.title('Input image')
plt.axis('off')
plt.subplot(1,2,2)
plt.imshow(abnormal_recon[0], 'gray')
plt.title('Reconstructed image')
plt.axis('off')
plt.show()
print('Reconstruciton error: ', ab_recon_err)

Reconstruciton error:  0.1113334372639656

Anomaly Detection

normal_err = []
abnormal_err = []

for i in range(200):
    img = normal_test_x[i].reshape(-1,28,28,1)
    normal_err.append(cae_model.evaluate(img, img, verbose = 0))

for j in range(200):
    img = abnormal_test_x[j].reshape(-1,28,28,1)
    abnormal_err.append(cae_model.evaluate(img, img, verbose = 0))

import scipy.stats as st
threshold = 0.05

plt.figure(figsize = (6, 4))
plt.plot(normal_err, '.', label = 'Normal')
plt.plot(abnormal_err, '.', label = 'Abnormal')
plt.xlabel('Data point index')
plt.ylabel('Reconstruction error')
plt.axhline(y = threshold, color = 'r', linestyle = '-')
plt.legend()
plt.show()

5. Anomaly Detection with GAN¶

5.1. AnoGAN (Anomaly Detection with GAN)¶

Train only normal (healthy) data, no abnormal data
How can I find an anomaly with a ‘generative model’?
- Remind Probability density estimation problem
After generating data randomly, optimize data as similar as possible to the target data through iteration.
- If the target data is different from the learned data (normal), it will not be generated well $\rightarrow$ bigger anomaly score

Train only normal (healthy) data, no abnormal data
Input data (unseen) to a well-trained GAN model → Compare anomaly scores to categorize

5.2. f-AnoGAN¶

AnoGAN requires an iterative procedure to find the latent $z$ that generates the target data.
- Key idea: Let’s make this process faster
Train only normal (healthy) data, no abnormal data
Train an additional encoder model to predict latent z from images
Generator is fixed
- Pixel reconstruction loss (discriminator feature loss can also be used)

Query data is regenerated directly through the encoder and generator
- If data is normal (trained), data will be regenerated well.
- Otherwise, anomaly score will be high

5.3. f-AnoGAN Implementation¶

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import random

Data Load

Train Dataset: digit 2 only (normal images)

Test Dataset: digit 2 and digit 6 (normal + anomaly images)

(train_imgs, train_labels), (test_imgs, test_labels) = tf.keras.datasets.mnist.load_data()
train_imgs, test_imgs = train_imgs/127.5 - 1.0, test_imgs/127.5 - 1.0

normal_train_index = np.hstack([np.where(train_labels == 2)])[0]
normal_test_index = np.hstack([np.where(test_labels == 2)])[0]
abnormal_test_index = np.hstack([np.where(test_labels == 6)])[0]

normal_train_x = train_imgs[normal_train_index].reshape(-1,28,28,1)
normal_train_y = train_labels[normal_train_index]

normal_test_x = test_imgs[normal_test_index].reshape(-1,28,28,1)
normal_test_y = test_labels[normal_test_index]

abnormal_test_x = test_imgs[abnormal_test_index].reshape(-1,28,28,1)
abnormal_test_y = test_labels[abnormal_test_index]

print('shape of normal_train_x:', normal_train_x.shape)
print('shape of normal_test_x:', normal_test_x.shape)
print('shape of abnormal_test_x:', abnormal_test_x.shape)

shape of normal_train_x: (5958, 28, 28, 1)
shape of normal_test_x: (1032, 28, 28, 1)
shape of abnormal_test_x: (958, 28, 28, 1)

idx = random.sample(range(normal_train_x.shape[0]), 4)

plt.figure(figsize = (8, 3))

for i in range(4):
    plt.subplot(1,4,i+1)
    plt.imshow(normal_train_x[idx[i]], 'gray')
    plt.title('Normal')
    plt.axis('off')

plt.tight_layout()
plt.show()

idx = random.sample(range(abnormal_test_x.shape[0]), 4)

plt.figure(figsize = (8, 3))

for i in range(4):
    plt.subplot(1,4,i+1)
    plt.imshow(abnormal_test_x[idx[i]], 'gray')
    plt.title('Abnormal')
    plt.axis('off')

plt.tight_layout()
plt.show()

Build GAN Model

Generator

Train with only normal data (digit 2 only)

generator = tf.keras.models.Sequential([
    tf.keras.layers.Dense(7*7*256,
                          use_bias = False,
                          input_shape = (100,)),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.LeakyReLU(),

    tf.keras.layers.Reshape((7, 7, 256)),

    tf.keras.layers.Conv2DTranspose(128,
                                    kernel_size = 5,
                                    strides = 1,
                                    padding = 'same',
                                    use_bias = False),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.LeakyReLU(),

    tf.keras.layers.Conv2DTranspose(64,
                                    kernel_size = 5,
                                    strides = 2,
                                    padding = 'same',
                                    use_bias = False),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.LeakyReLU(),

    tf.keras.layers.Conv2DTranspose(1,
                                    kernel_size = 5,
                                    strides = 2,
                                    padding = 'same',
                                    use_bias = False,
                                    activation = 'tanh')
])

generator.summary()

Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_2 (Dense)             (None, 12544)             1254400   
                                                                 
 batch_normalization_3 (Bat  (None, 12544)             50176     
 chNormalization)                                                
                                                                 
 leaky_re_lu_5 (LeakyReLU)   (None, 12544)             0         
                                                                 
 reshape_1 (Reshape)         (None, 7, 7, 256)         0         
                                                                 
 conv2d_transpose_6 (Conv2D  (None, 7, 7, 128)         819200    
 Transpose)                                                      
                                                                 
 batch_normalization_4 (Bat  (None, 7, 7, 128)         512       
 chNormalization)                                                
                                                                 
 leaky_re_lu_6 (LeakyReLU)   (None, 7, 7, 128)         0         
                                                                 
 conv2d_transpose_7 (Conv2D  (None, 14, 14, 64)        204800    
 Transpose)                                                      
                                                                 
 batch_normalization_5 (Bat  (None, 14, 14, 64)        256       
 chNormalization)                                                
                                                                 
 leaky_re_lu_7 (LeakyReLU)   (None, 14, 14, 64)        0         
                                                                 
 conv2d_transpose_8 (Conv2D  (None, 28, 28, 1)         1600      
 Transpose)                                                      
                                                                 
=================================================================
Total params: 2330944 (8.89 MB)
Trainable params: 2305472 (8.79 MB)
Non-trainable params: 25472 (99.50 KB)
_________________________________________________________________

Discriminator

Train with only normal data (digit 2 only)

discriminator = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(64,
                           kernel_size = 5,
                           strides = 2,
                           padding = 'same',
                           input_shape = (28, 28, 1)),
    tf.keras.layers.LeakyReLU(),
    tf.keras.layers.Dropout(0.3),

    tf.keras.layers.Conv2D(128,
                           kernel_size = 5,
                           strides = 2,
                           padding = 'same'),
    tf.keras.layers.LeakyReLU(),
    tf.keras.layers.Dropout(0.3),

    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(1, activation = 'sigmoid')
])

discriminator.summary()

Model: "sequential_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d_5 (Conv2D)           (None, 14, 14, 64)        1664      
                                                                 
 leaky_re_lu_8 (LeakyReLU)   (None, 14, 14, 64)        0         
                                                                 
 dropout_2 (Dropout)         (None, 14, 14, 64)        0         
                                                                 
 conv2d_6 (Conv2D)           (None, 7, 7, 128)         204928    
                                                                 
 leaky_re_lu_9 (LeakyReLU)   (None, 7, 7, 128)         0         
                                                                 
 dropout_3 (Dropout)         (None, 7, 7, 128)         0         
                                                                 
 flatten_1 (Flatten)         (None, 6272)              0         
                                                                 
 dense_3 (Dense)             (None, 1)                 6273      
                                                                 
=================================================================
Total params: 212865 (831.50 KB)
Trainable params: 212865 (831.50 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

Model (Generator + Discriminator) Compile

discriminator.compile(optimizer = tf.keras.optimizers.Adam(learning_rate = 0.0001),
                      loss = 'binary_crossentropy')

combined_input = tf.keras.layers.Input(shape = (100,))
generated = generator(combined_input)
discriminator.trainable = False
combined_output = discriminator(generated)

combined = tf.keras.models.Model(inputs = combined_input, outputs = combined_output)

combined.compile(optimizer = tf.keras.optimizers.Adam(learning_rate = 0.0001),
                 loss = 'binary_crossentropy')

Train GAN

def make_noise(samples):
    return np.random.normal(0, 1, [samples, 100])

def plot_generated_images(generator, samples = 3):

    noise = make_noise(samples)

    generated_images = generator.predict(noise)
    generated_images = generated_images.reshape(samples, 28, 28)

    for i in range(samples):
        plt.subplot(1, samples, i+1)
        plt.imshow(generated_images[i], 'gray', interpolation = 'nearest')
        plt.axis('off')
        plt.tight_layout()

    plt.show()

n_iter = 20000
batch_size = 256

fake = np.zeros(batch_size)
real = np.ones(batch_size)

for i in range(n_iter+1):

    # Train Discriminator
    noise = make_noise(batch_size)
    generated_images = generator.predict(noise, verbose = 0)

    idx = np.random.randint(0, normal_train_x.shape[0], batch_size)
    real_images = normal_train_x[idx]

    D_loss_real = discriminator.train_on_batch(real_images, real)
    D_loss_fake = discriminator.train_on_batch(generated_images, fake)
    D_loss = D_loss_real + D_loss_fake

    # Train Generator
    noise = make_noise(batch_size)
    G_loss = combined.train_on_batch(noise, real)

    if i % 2000 == 0:
        print('Discriminator Loss: ', D_loss)
        print('Generator Loss: ', G_loss)

        plot_generated_images(generator)

Discriminator Loss:  1.3417497277259827
Generator Loss:  0.6580652594566345
1/1 [==============================] - 0s 125ms/step

Discriminator Loss:  1.7134812474250793
Generator Loss:  0.7989770174026489
1/1 [==============================] - 0s 16ms/step

Discriminator Loss:  1.5435508489608765
Generator Loss:  0.7616167664527893
1/1 [==============================] - 0s 19ms/step

Discriminator Loss:  1.401880919933319
Generator Loss:  0.7737089395523071
1/1 [==============================] - 0s 16ms/step

Discriminator Loss:  1.351719617843628
Generator Loss:  0.7830795645713806
1/1 [==============================] - 0s 23ms/step

Discriminator Loss:  1.4500216841697693
Generator Loss:  0.7219251394271851
1/1 [==============================] - 0s 24ms/step

Discriminator Loss:  1.281995952129364
Generator Loss:  0.8453808426856995
1/1 [==============================] - 0s 15ms/step

Discriminator Loss:  1.3021539449691772
Generator Loss:  0.7710041999816895
1/1 [==============================] - 0s 16ms/step

Discriminator Loss:  1.360207736492157
Generator Loss:  0.7714222073554993
1/1 [==============================] - 0s 17ms/step

Discriminator Loss:  1.2689515352249146
Generator Loss:  0.892619252204895
1/1 [==============================] - 0s 26ms/step

Build Encoder for fast-AnoGAN

Encoder

To predict latent $z$ from image

Encoder = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(32,
                           kernel_size = 4,
                           strides = 2,
                           padding = 'same',
                           input_shape = (28, 28, 1)),
    tf.keras.layers.LeakyReLU(),

    tf.keras.layers.Conv2D(64,
                           kernel_size = 4,
                           strides = 2,
                           padding = 'same'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.LeakyReLU(),

    tf.keras.layers.Conv2D(128,
                           kernel_size = 4,
                           strides = 2,
                           padding = 'same'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.LeakyReLU(),

    tf.keras.layers.Conv2D(100,
                           kernel_size = 4,
                           strides = 1,
                           padding = 'valid'),
    tf.keras.layers.Flatten()
])

Encoder.summary()

Model: "sequential_6"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d_7 (Conv2D)           (None, 14, 14, 32)        544       
                                                                 
 leaky_re_lu_10 (LeakyReLU)  (None, 14, 14, 32)        0         
                                                                 
 conv2d_8 (Conv2D)           (None, 7, 7, 64)          32832     
                                                                 
 batch_normalization_6 (Bat  (None, 7, 7, 64)          256       
 chNormalization)                                                
                                                                 
 leaky_re_lu_11 (LeakyReLU)  (None, 7, 7, 64)          0         
                                                                 
 conv2d_9 (Conv2D)           (None, 4, 4, 128)         131200    
                                                                 
 batch_normalization_7 (Bat  (None, 4, 4, 128)         512       
 chNormalization)                                                
                                                                 
 leaky_re_lu_12 (LeakyReLU)  (None, 4, 4, 128)         0         
                                                                 
 conv2d_10 (Conv2D)          (None, 1, 1, 100)         204900    
                                                                 
 flatten_2 (Flatten)         (None, 100)               0         
                                                                 
=================================================================
Total params: 370244 (1.41 MB)
Trainable params: 369860 (1.41 MB)
Non-trainable params: 384 (1.50 KB)
_________________________________________________________________

Model (Encoder + Generator) Compile

Set parameters in a generator untrainable

encoder_combined_input = tf.keras.layers.Input(shape = (28, 28, 1))
latentz = Encoder(encoder_combined_input)
generator.trainable = False
regenerated_output = generator(latentz)

e_g_combined = tf.keras.models.Model(inputs = encoder_combined_input, outputs = regenerated_output)

e_g_combined.compile(optimizer = tf.keras.optimizers.Adam(learning_rate = 0.0001),
                     loss = 'mean_squared_error')

Train Encoder

n_iter = 20000
batch_size = 32

e_losses = []

for i in range(n_iter+1):

    idx = np.random.randint(0, normal_train_x.shape[0], batch_size)
    real_images = normal_train_x[idx]

    recon_loss = e_g_combined.train_on_batch(real_images, real_images)

    if i % 100 == 0:
        e_losses.append(recon_loss)

    if i % 2000 == 0:
        print('recon_loss: ', recon_loss)

recon_loss:  0.521608829498291
recon_loss:  0.050642628222703934
recon_loss:  0.04527051001787186
recon_loss:  0.04172786325216293
recon_loss:  0.030090728774666786
recon_loss:  0.03498825803399086
recon_loss:  0.03760073706507683
recon_loss:  0.030637653544545174
recon_loss:  0.03451306372880936
recon_loss:  0.026821859180927277
recon_loss:  0.0322415865957737

Calculate Anomaly Score

def compare_images(cls, real_img, generated_img, score, threshold=50):
    real_img = ((real_img+1.0)*255).squeeze()
    generated_img = ((generated_img+1.0)*255).squeeze()

    negative = np.zeros_like(real_img)

    diff_img = real_img - generated_img

    diff_img[diff_img <= threshold] = 0

    anomaly_img = np.zeros(shape=(28, 28, 3))
    anomaly_img[:, :, 0] = real_img - diff_img
    anomaly_img[:, :, 1] = real_img - diff_img
    anomaly_img[:, :, 2] = real_img - diff_img
    anomaly_img[:, :, 0] = anomaly_img[:,:,0] + diff_img
    anomaly_img = anomaly_img.astype(np.uint8)

    fig, plots = plt.subplots(1, 4)
    fig.suptitle(f'Class: {cls} - (anomaly score: {score:.4})')

    fig.set_figwidth(9)
    fig.set_tight_layout(True)
    plots = plots.reshape(-1)
    plots[0].imshow(real_img, cmap='gray', label='real')
    plots[1].imshow(generated_img, cmap='gray')
    plots[2].imshow(diff_img, cmap='gray')
    plots[3].imshow(anomaly_img)

    plots[0].set_title('real')
    plots[1].set_title('generated')
    plots[2].set_title('difference')
    plots[3].set_title('Anomaly Detection')
    plt.show()

def calculate_anomaly_score(test_image, sample_num, plot_options = True):
    generator.trainable = False
    Encoder.trainable = False

    anomaly_score_list = []

    for i in range(sample_num):
        idx = np.random.randint(0, test_image.shape[0], 1)
        real_img = test_image[idx]

        real_z = Encoder(real_img)
        fake_img = generator(real_z)

        img_difference = np.sum((real_img - fake_img)**2)/(28**2)
        anomaly_score = img_difference
        anomaly_score_list.append(anomaly_score)

        if not plot_options:
            continue

        if anomaly_score >= 0.05:
            cls = 'Abnormal'
        else:
            cls = 'Normal'

        compare_images(cls, real_img, fake_img.numpy(), anomaly_score, threshold = 50)

    if not plot_options:
        return np.array(anomaly_score_list)

Anomaly Score of Normal Data

calculate_anomaly_score(normal_test_x, 2)

<Figure size 600x400 with 0 Axes>

<Figure size 600x400 with 0 Axes>

Anomaly Score of Abnormal Data

calculate_anomaly_score(abnormal_test_x, 2)

<Figure size 600x400 with 0 Axes>

<Figure size 600x400 with 0 Axes>

Plot Anomaly Scores

Can be more accurate with more f-AnoGAN training.

normal_scores = calculate_anomaly_score(normal_test_x, 100, plot_options = False)
abnormal_scores = calculate_anomaly_score(abnormal_test_x, 100, plot_options = False)

plt.figure(figsize = (6, 4))
plt.plot(normal_scores, '.', label = 'Normal')
plt.plot(abnormal_scores, '.', label = 'Abnormal')
plt.xlabel('Data point index')
plt.ylabel('Anomaly score')
plt.axhline(y = 0.05, color = 'r', linestyle = '-')
plt.legend()
plt.show()

6. Anomaly Detection with LSTM¶

LSTM-based anomaly detection utilize'prediction'

Predict and calculate anomaly score
- Train with normal data
- Good performance on periodic signals

Examples

6.1 Python Implementation¶

NASA Bearing Dataset

Prognostic Dataset for Predictive/Preventive Maintenance
https://www.kaggle.com/datasets/vinayak123tyagi/bearing-dataset (3rd_test)
Download AD_bearing.npy

import tensorflow as tf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import random

from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive

AD_bearing = np.load('/content/drive/MyDrive/DL_Colab/DL_data/AD_bearing.npy')
print("Shape of total data: ", AD_bearing.shape)

Shape of total data:  (6324, 4)

plt.figure(figsize = (8, 6))
plt.plot(AD_bearing[:,0], label = 'Bearing 1', color = 'b', linewidth = 2)
plt.plot(AD_bearing[:,1], label = 'Bearing 2', color = 'r', linewidth = 2)
plt.plot(AD_bearing[:,2], label = 'Bearing 3', color = 'g', linewidth = 2)
plt.plot(AD_bearing[:,3], label = 'Bearing 4', color = 'b', linewidth = 2)
plt.legend(loc = 'upper left')
plt.title('Bearing Sensor Training Data')
plt.show()

We will use 'Bearing 3' only (i.e., AD_bearing[:,2])
Use first 4000 data points in the original data as the training data
Use the rest of data points as the test data

bearing_3 = AD_bearing[:,2]
train = bearing_3[0:4000].reshape(-1, 1)
test = bearing_3[4000:].reshape(-1, 1)

print("Training dataset shape:", train.shape)
print("Test dataset shape:", test.shape)

Training dataset shape: (4000, 1)
Test dataset shape: (2324, 1)

plt.figure(figsize = (8, 6))
plt.plot(np.arange(0, train.shape[0]), train, label = 'Bearing 3_train', linewidth = 2)
plt.plot(np.arange(4000, 6324), test, label = 'Bearing 3_test', linewidth = 2)
plt.legend(loc = 'upper left', fontsize = 16)
plt.title('Bearing Sensor Train and Test Data', fontsize = 16)
plt.xlabel('Data points')
plt.show()

At the end of the test-to-failure experiment, outer race failure occurred in bearing 3.

LSTM Model

n_step = 20
n_input = 50

# LSTM shape
n_lstm1 = 300
n_lstm2 = 300
n_lstm3 = 300

# fully connected
n_hidden = 300
n_output = 50

lstm_network = tf.keras.models.Sequential([
    tf.keras.layers.Input(shape = (n_step, n_input)),
    tf.keras.layers.LSTM(n_lstm1, return_sequences = True),
    tf.keras.layers.LSTM(n_lstm2, return_sequences = True),
    tf.keras.layers.LSTM(n_lstm3),
    tf.keras.layers.Dense(n_hidden, activation = 'relu'),
    tf.keras.layers.Dense(n_output),
])

lstm_network.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 lstm (LSTM)                 (None, 20, 300)           421200    
                                                                 
 lstm_1 (LSTM)               (None, 20, 300)           721200    
                                                                 
 lstm_2 (LSTM)               (None, 300)               721200    
                                                                 
 dense (Dense)               (None, 300)               90300     
                                                                 
 dense_1 (Dense)             (None, 50)                15050     
                                                                 
=================================================================
Total params: 1968950 (7.51 MB)
Trainable params: 1968950 (7.51 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

lstm_network.compile(optimizer = 'adam',
                     loss = 'mean_squared_error',
                     metrics = ['mse'])

Train/Test Data Split

def dataset(train, test, n_samples, n_step = n_step, n_input = n_input, n_output = n_output):

    train_x_list = []
    train_y_list = []

    n_data = train.shape[0]
    random.seed(0)
    start_point = random.sample(list(np.arange(0, n_data-(n_step+1)*n_input)), n_samples)

    for i in start_point:
        train_x_list.append(train[i:i + n_step*n_input].reshape(n_step, n_input))
        train_y_list.append(train[i + n_step*n_input:i + n_step*n_input + n_output])

    train_data = np.array(train_x_list)
    train_label = np.array(train_y_list)

    test_data = test[0:n_step*n_input]
    test_data = test_data.reshape(1, n_step, n_input)
    test_label = test[n_step*n_input:n_step*n_input+n_output].ravel()

    return train_data, train_label, test_data, test_label

train_data, train_label, test_data, test_label = dataset(train, test, 2000)
print('Train data shape:', train_data.shape)

Train data shape: (2000, 20, 50)

Model Training

lstm_network.fit(train_data, train_label, epochs = 15)

Epoch 1/15
63/63 [==============================] - 12s 14ms/step - loss: 1.1619e-04 - mse: 1.1619e-04
Epoch 2/15
63/63 [==============================] - 1s 12ms/step - loss: 1.8777e-06 - mse: 1.8777e-06
Epoch 3/15
63/63 [==============================] - 1s 12ms/step - loss: 2.0915e-06 - mse: 2.0915e-06
Epoch 4/15
63/63 [==============================] - 1s 9ms/step - loss: 2.0915e-06 - mse: 2.0915e-06
Epoch 5/15
63/63 [==============================] - 1s 9ms/step - loss: 2.0193e-06 - mse: 2.0193e-06
Epoch 6/15
63/63 [==============================] - 1s 9ms/step - loss: 1.7700e-06 - mse: 1.7700e-06
Epoch 7/15
63/63 [==============================] - 1s 9ms/step - loss: 1.8708e-06 - mse: 1.8708e-06
Epoch 8/15
63/63 [==============================] - 1s 9ms/step - loss: 2.1130e-06 - mse: 2.1130e-06
Epoch 9/15
63/63 [==============================] - 1s 9ms/step - loss: 1.7795e-06 - mse: 1.7795e-06
Epoch 10/15
63/63 [==============================] - 1s 9ms/step - loss: 2.0281e-06 - mse: 2.0281e-06
Epoch 11/15
63/63 [==============================] - 1s 9ms/step - loss: 2.3448e-06 - mse: 2.3448e-06
Epoch 12/15
63/63 [==============================] - 1s 9ms/step - loss: 1.8821e-06 - mse: 1.8821e-06
Epoch 13/15
63/63 [==============================] - 1s 9ms/step - loss: 1.8542e-06 - mse: 1.8542e-06
Epoch 14/15
63/63 [==============================] - 1s 9ms/step - loss: 1.8461e-06 - mse: 1.8461e-06
Epoch 15/15
63/63 [==============================] - 1s 9ms/step - loss: 2.0473e-06 - mse: 2.0473e-06

<keras.src.callbacks.History at 0x7d3500182230>

Results

test_pred = lstm_network.predict(train_data[0:1]).ravel()

plt.figure(figsize = (8, 6))
plt.plot(np.arange(0, n_step*n_input + n_output), np.hstack([train_data[0:1].ravel(), train_label[0:1].ravel()]), 'b', label = 'Ground truth')
plt.plot(np.arange(n_step*n_input, n_step*n_input + n_output), test_pred, 'r', label = 'Prediction')
plt.vlines(n_step*n_input, 0.05, 0.06, colors = 'r', linestyles = 'dashed')
plt.ylim([0.04, 0.07])
plt.legend(fontsize = 13, loc = 'upper left')
plt.xlabel('Data points')
plt.show()

1/1 [==============================] - 0s 23ms/step

Difference Between Predicted and Measured Signal

gen_signal = []

for i in range((test.shape[0]-n_step*n_input)//n_output):
    test_pred = lstm_network.predict(test_data, verbose = 0)
    gen_signal.append(test_pred.ravel())
    test_pred = test_pred[:, np.newaxis, :]

    test_data = test_data[:, 1:, :]
    test_data = np.concatenate([test_data, test_pred], axis = 1)

gen_signal = np.concatenate(gen_signal)
test_label = test[n_step*n_input:n_step*n_input+n_output*(i+1)]

plt.figure(figsize = (8, 6))
plt.plot(test_label, 'b', label = 'Measured signal')
plt.plot(gen_signal, 'r', label = 'Prediction')
plt.legend(fontsize = 15, loc = 'upper left')
plt.xlabel('Data points')
plt.show()

plt.figure(figsize = (8, 6))
plt.plot(np.abs(test_label.reshape(-1) - gen_signal), label = 'Anomaly score')
plt.legend(fontsize = 15, loc = 'upper left')
plt.hlines(0.005, 0, 1300, colors = 'r', linestyles = 'dashed')
plt.xlabel('Data points')
plt.ylabel('Anomaly score (difference)')
plt.show()

%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')