XAI: eXplainable AI

Table of Contents

1. Black Box AI¶

Black box AI: AI produces insights based on a data set, but the end-user doesn’t know how
- Many machine learning and deep learning models share ‘black box’ problem
- AI does not provide reasons behind the decision or prediction it makes
- The reliability of AI models may be questioned

No description has been provided for this image

XAI which humans can understand the decisions or predictions made by the AI

Why XAI?

XAI can be used to increase the interpretability of AI by enabling description of the expected outcome and potential bias of the model
Depending on the AI performance, XAI results can be used in various ways:
- AI performance < Human performance
  - XAI suggests improvement directions for AI models
- AI performance ≈ Human performance
  - XAI identifies the principles behind AI model learning
- AI performance > Human performance
  - XAI enables acquiring new knowledge from AI

Model-Specific XAI

Model-Specific XAI: only applicable to specific algorithms that provides explanations by using the intrinsic structure of the model
- Examples: Class Activation Mapping (CAM) & Gradient-CAM for Convolution Neural Network (CNN) models

Model-Agnostic XAI

Model Agnostic XAI: applicable to any machine learning algorithms and work on the black box model
- Obtain explanations by perturbing and mutating the input data and obtaining sensitivity of the performance of theses mutations with respect to the original data performance
- Examples:
  - SHapley Additive exPlanations (SHAP)
  - Local Interpretable Model-agnostic Explanations (LIME)

2. SHapley Additive exPlanations (SHAP)¶

SHAP: a game theoretic approach to explain the output of any machine learning model
- Compute feature importance on each predicted value using Shapley value
- According to the SHAP value, the contribution of each feature can be expressed as the degree of change in the overall performance when the contribution of that feature is excluded
- Unlike general permutation method, SHAP calculates the model influence by considering the dependencies between features

from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).

from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import tensorflow as tf

2.1. Build and Predict Machine Learning and Deep Learning Models¶

Data Description

Injection molding dataset consisting of 36 mold shapes
This dataset consists of 5 process features (control parameters), 32 mold shape features, and the weight of the molded part (ground truth).

dataset = pd.read_excel('/content/drive/MyDrive/linc/data_files/train_dataset.xlsx',
                        index_col = 0,
                        engine = 'openpyxl')

dataset

X = dataset.drop('Weight (g)', axis = 1)
y = dataset.loc[:, 'Weight (g)']

X = (X - X.min()) / (X.max() - X.min())

train/test split

from sklearn.model_selection import train_test_split
train_x, test_x, train_y, test_y = train_test_split(X, y, test_size = 0.30, random_state = 42)

print('train_x: {}, train_y: {}'.format(train_x.shape, train_y.shape))
print('test_x: {}, test_y: {}'.format(test_x.shape, test_y.shape))

train_x: (2450, 37), train_y: (2450,)
test_x: (1050, 37), test_y: (1050,)

2.2. XAI with Machine Learning¶

2.2.1. RandomForest Model¶

from sklearn.ensemble import RandomForestRegressor

Randomforest train

rf = RandomForestRegressor(random_state = 1)
rf.fit(train_x, train_y)

RandomForestRegressor(random_state=1)

Prediction for testset

pred_y = rf.predict(test_x)

fig, ax = plt.subplots(figsize=(10, 10))
ax.scatter(test_y, pred_y, edgecolors=(0, 0, 0))
ax.plot([y.min(), y.max()], [y.min(), y.max()], "r--", lw=4)
ax.set_xlabel("Measured")
ax.set_ylabel("Predicted")
plt.show()

2.2.2. SHAP¶

SHAP Implementation

!pip install shap

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Requirement already satisfied: shap in /usr/local/lib/python3.8/dist-packages (0.41.0)
Requirement already satisfied: tqdm>4.25.0 in /usr/local/lib/python3.8/dist-packages (from shap) (4.64.1)
Requirement already satisfied: scipy in /usr/local/lib/python3.8/dist-packages (from shap) (1.7.3)
Requirement already satisfied: numpy in /usr/local/lib/python3.8/dist-packages (from shap) (1.21.6)
Requirement already satisfied: cloudpickle in /usr/local/lib/python3.8/dist-packages (from shap) (2.2.0)
Requirement already satisfied: slicer==0.0.7 in /usr/local/lib/python3.8/dist-packages (from shap) (0.0.7)
Requirement already satisfied: pandas in /usr/local/lib/python3.8/dist-packages (from shap) (1.3.5)
Requirement already satisfied: numba in /usr/local/lib/python3.8/dist-packages (from shap) (0.56.4)
Requirement already satisfied: scikit-learn in /usr/local/lib/python3.8/dist-packages (from shap) (1.0.2)
Requirement already satisfied: packaging>20.9 in /usr/local/lib/python3.8/dist-packages (from shap) (21.3)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging>20.9->shap) (3.0.9)
Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.8/dist-packages (from numba->shap) (6.0.0)
Requirement already satisfied: llvmlite<0.40,>=0.39.0dev0 in /usr/local/lib/python3.8/dist-packages (from numba->shap) (0.39.1)
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (from numba->shap) (57.4.0)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.8/dist-packages (from pandas->shap) (2022.7)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas->shap) (2.8.2)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.8/dist-packages (from scikit-learn->shap) (3.1.0)
Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.8/dist-packages (from scikit-learn->shap) (1.2.0)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.8/dist-packages (from python-dateutil>=2.7.3->pandas->shap) (1.15.0)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.8/dist-packages (from importlib-metadata->numba->shap) (3.11.0)

import shap

explainer = shap.TreeExplainer(rf)
shap_values = explainer.shap_values(test_x)

Average of Absolute SHAP Values of Entire Test Data (Feature Importance)

shap.summary_plot(shap_values, test_x, plot_type = "bar")

2.3. XAI with Deep Learning¶

2.3.1. ANN Model¶

tf.random.set_seed(42)

model = tf.keras.models.Sequential([
    tf.keras.layers.Input(shape = (train_x.shape[1],)),
    tf.keras.layers.Dense(units = 10, activation = 'relu'),
    tf.keras.layers.Dense(units = 10, activation = 'relu'),
    tf.keras.layers.Dense(units = 10, activation = 'relu'),
    tf.keras.layers.Dense(units = 10, activation = 'relu'),
    tf.keras.layers.Dense(units = 10, activation = 'relu'),
    tf.keras.layers.Dense(units = 1, activation = None)
])

Model define and train

model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 10)                380       
                                                                 
 dense_1 (Dense)             (None, 10)                110       
                                                                 
 dense_2 (Dense)             (None, 10)                110       
                                                                 
 dense_3 (Dense)             (None, 10)                110       
                                                                 
 dense_4 (Dense)             (None, 10)                110       
                                                                 
 dense_5 (Dense)             (None, 1)                 11        
                                                                 
=================================================================
Total params: 831
Trainable params: 831
Non-trainable params: 0
_________________________________________________________________

model.compile(optimizer = tf.keras.optimizers.Adam(learning_rate = 0.001),
              loss = 'mse')

loss = model.fit(train_x, train_y, epochs = 50)

Epoch 1/50
77/77 [==============================] - 1s 3ms/step - loss: 3826.6624
Epoch 2/50
77/77 [==============================] - 0s 3ms/step - loss: 1401.9835
Epoch 3/50
77/77 [==============================] - 0s 2ms/step - loss: 551.9330
Epoch 4/50
77/77 [==============================] - 0s 3ms/step - loss: 205.0996
Epoch 5/50
77/77 [==============================] - 0s 3ms/step - loss: 45.7967
Epoch 6/50
77/77 [==============================] - 0s 3ms/step - loss: 23.1419
Epoch 7/50
77/77 [==============================] - 0s 3ms/step - loss: 16.0034
Epoch 8/50
77/77 [==============================] - 0s 3ms/step - loss: 11.1965
Epoch 9/50
77/77 [==============================] - 0s 3ms/step - loss: 7.6744
Epoch 10/50
77/77 [==============================] - 0s 3ms/step - loss: 5.5784
Epoch 11/50
77/77 [==============================] - 0s 3ms/step - loss: 4.3193
Epoch 12/50
77/77 [==============================] - 0s 2ms/step - loss: 3.5589
Epoch 13/50
77/77 [==============================] - 0s 2ms/step - loss: 3.0735
Epoch 14/50
77/77 [==============================] - 0s 3ms/step - loss: 2.7209
Epoch 15/50
77/77 [==============================] - 0s 3ms/step - loss: 2.4478
Epoch 16/50
77/77 [==============================] - 0s 3ms/step - loss: 2.2909
Epoch 17/50
77/77 [==============================] - 0s 3ms/step - loss: 2.1174
Epoch 18/50
77/77 [==============================] - 0s 3ms/step - loss: 1.9495
Epoch 19/50
77/77 [==============================] - 0s 3ms/step - loss: 1.8210
Epoch 20/50
77/77 [==============================] - 0s 3ms/step - loss: 1.7353
Epoch 21/50
77/77 [==============================] - 0s 3ms/step - loss: 1.6389
Epoch 22/50
77/77 [==============================] - 0s 3ms/step - loss: 1.5248
Epoch 23/50
77/77 [==============================] - 0s 3ms/step - loss: 1.4521
Epoch 24/50
77/77 [==============================] - 0s 3ms/step - loss: 1.3797
Epoch 25/50
77/77 [==============================] - 0s 3ms/step - loss: 1.3179
Epoch 26/50
77/77 [==============================] - 0s 3ms/step - loss: 1.2662
Epoch 27/50
77/77 [==============================] - 0s 2ms/step - loss: 1.2175
Epoch 28/50
77/77 [==============================] - 0s 2ms/step - loss: 1.1822
Epoch 29/50
77/77 [==============================] - 0s 3ms/step - loss: 1.1188
Epoch 30/50
77/77 [==============================] - 0s 3ms/step - loss: 1.0956
Epoch 31/50
77/77 [==============================] - 0s 3ms/step - loss: 1.0503
Epoch 32/50
77/77 [==============================] - 0s 2ms/step - loss: 1.0235
Epoch 33/50
77/77 [==============================] - 0s 3ms/step - loss: 0.9803
Epoch 34/50
77/77 [==============================] - 0s 3ms/step - loss: 0.9977
Epoch 35/50
77/77 [==============================] - 0s 3ms/step - loss: 0.9452
Epoch 36/50
77/77 [==============================] - 0s 3ms/step - loss: 0.9025
Epoch 37/50
77/77 [==============================] - 0s 3ms/step - loss: 0.8831
Epoch 38/50
77/77 [==============================] - 0s 3ms/step - loss: 0.9378
Epoch 39/50
77/77 [==============================] - 0s 3ms/step - loss: 0.8732
Epoch 40/50
77/77 [==============================] - 0s 3ms/step - loss: 0.8205
Epoch 41/50
77/77 [==============================] - 0s 3ms/step - loss: 0.7913
Epoch 42/50
77/77 [==============================] - 0s 3ms/step - loss: 0.7631
Epoch 43/50
77/77 [==============================] - 0s 3ms/step - loss: 0.7506
Epoch 44/50
77/77 [==============================] - 0s 3ms/step - loss: 0.7250
Epoch 45/50
77/77 [==============================] - 0s 3ms/step - loss: 0.7655
Epoch 46/50
77/77 [==============================] - 0s 3ms/step - loss: 0.7540
Epoch 47/50
77/77 [==============================] - 0s 3ms/step - loss: 0.7042
Epoch 48/50
77/77 [==============================] - 0s 3ms/step - loss: 0.6687
Epoch 49/50
77/77 [==============================] - 0s 3ms/step - loss: 0.6433
Epoch 50/50
77/77 [==============================] - 0s 3ms/step - loss: 0.6618

Prediction for testset

pred_y = model.predict(test_x)

33/33 [==============================] - 0s 2ms/step

fig, ax = plt.subplots(figsize = (6, 6))
ax.scatter(test_y, pred_y, edgecolors = (0, 0, 0))
ax.plot([test_y.min(), test_y.max()], [test_y.min(), test_y.max()], "r--", lw = 4)
ax.set_xlabel("Measured")
ax.set_ylabel("Predicted")
plt.show()

2.3.2. SHAP¶

SHAP Implementation

import shap

explainer_shap = shap.DeepExplainer(model = model,
                                    data = train_x.to_numpy())
shap_values = explainer_shap.shap_values(test_x.to_numpy())

keras is no longer supported, please use tf.keras instead.
Your TensorFlow version is newer than 2.4.0 and so graph support has been removed in eager mode and some static graphs may not be supported. See PR #1483 for discussion.
`tf.keras.backend.set_learning_phase` is deprecated and will be removed after 2020-10-11. To update it, simply pass a True/False value to the `training` argument of the `__call__` method of your layer or model.

Average of Absolute SHAP Values of Entire Test Data (Feature Importance)

shap.summary_plot(shap_values, test_x.to_numpy(),
                  feature_names = X.columns)

3. Class Activation Maps (CAM)¶

Attention
Visualizing and Understanding Convolutional Networks
Source
- Source paper from http://cnnlocalization.csail.mit.edu/
- Source code from https://github.com/metalbubble/CAM

3.1. CNN with a Fully Connected Layer¶

The conventional CNN can be conceptually divided into two parts. One part is feature extraction and the other is classification. In the feature extraction process, convolution is used to extract the features of the input data so that the classification can be performed well. The classification process classifies which group each input data belongs to by using the extracted features from the input data.

When we visually identify images, we do not look at the whole image; instead, we intuitively focus on the most important parts of the image. CNN learning is similar to the way humans focus. When its weights are optimized, the more important parts are given higher weights. But generally, we are not able to recognize this because the generic CNN goes through a fully connected layer and makes the features extracted by the convolution layer more abstract.

Issues on CNN (or Deep Learning)

Deep learning performs well comparing with any other existing algorithms
But works as a black box
- A classification result is simply returned without knowing how the classification results are derived → little interpretability
When we visually identify images, we do not look at the whole image
Instead, we intuitively focus on the most important parts of the image
When CNN weights are optimized, the more important parts are given higher weights
Class activation map (CAM)
- We can determine which parts of the image the model is focusing on, based on the learned weights
- Highlighting the importance of the image region to the prediction

3.2. CAM: CNN with a Global Average Pooling¶

shed light on how it explicitly enables the convolutional neural network to have remarkable localization ability
the heatmap is the class activation map, highlighting the importance of the image region to the prediction

The deep learning model is a black box model. When input data is received, a classification result of 1 or 0 is simply returned for the binary classification problem, without knowing how the classification results are derived. Meanwhile, The class activation map (CAM) is capable of interpreting the results of the classification. We can determine which parts of the image the model is focusing on. Through an analysis of which part of the image the model is focusing on, we are able to interpret which part of the image is considered important.

The class activation map (CAM) is a modified convolution layer. It directly highlights the important parts of the spatial grid of an image. As a result, we can see the emphasized parts of the model. The below figure describes the procedure for class activation mapping.

The feature maps of the last convolution layer can be interpreted as a collection of visual spatial locations focused on by the model. The CAM can be obtained by taking a linear sum of the features. They all have different weights and thus can obtain spatial locations according to various input images through a linear combination. For a given image, $f_k(x,y)$ represents the feature map of unit $k$ in the last convolution layer at spatial location $(x,y)$. For a given class $c$, the class score, $S_c$, is expressed as the following equation.

$$S_c = \sum_k \omega_k^c \sum_{x,y} f_k(x,y)= \sum_{x,y} \sum_k \omega_k^c \; f_k(x,y)$$

where $\omega_k^c$ the weight corresponding to class $c$ for unit $k$. The class activation map for class $c$ is denoted as $M_c$.

$$M_c(x,y) = \sum_k \omega_k^c \; f_k(x,y)$$

$M_c$ directly indicates the importance of the feature map at a spatial grid $(x,y)$ of the class $c$. Finally the output of the softmax for class $c$ is,

$$P_c = \frac{\exp\left(S_c\right)}{\sum_c \exp\left(S_c\right)}$$

In case of the CNN, the size of the feature map is reduced by the pooling layer. By simple up-sampling, it is possible to identify attention image regions for each label.

Limitations of Class Activation Maps (CAM)

Requires a Global Average Pooling layer
Unable to visualize feature maps from different layers (other than the last)

3.3. CAM with NEU¶

Download NEU steel surface defects images and labels

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import cv2

from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).

# Change file paths if necessary

train_x = np.load('/content/drive/MyDrive/DL_Colab/DL_data/NEU_train_imgs.npy')
train_y = np.load('/content/drive/MyDrive/DL_Colab/DL_data/NEU_train_labels.npy')

test_x = np.load('/content/drive/MyDrive/DL_Colab/DL_data/NEU_test_imgs.npy')
test_y = np.load('/content/drive/MyDrive/DL_Colab/DL_data/NEU_test_labels.npy')

n_train = train_x.shape[0]
n_test = test_x.shape[0]

print ("The number of training images : {}, shape : {}".format(n_train, train_x.shape))
print ("The number of testing images : {}, shape : {}".format(n_test, test_x.shape))

The number of training images : 1500, shape : (1500, 200, 200, 1)
The number of testing images : 300, shape : (300, 200, 200, 1)

model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(filters = 32,
                           kernel_size = (3,3),
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (200, 200, 1)),

    tf.keras.layers.MaxPool2D((2,2)),

    tf.keras.layers.Conv2D(filters = 64,
                           kernel_size = (3,3),
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (100, 100, 32)),

    tf.keras.layers.MaxPool2D((2,2)),

    tf.keras.layers.Conv2D(filters = 64,
                           kernel_size = (3,3),
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (50, 50, 64)),

    tf.keras.layers.MaxPool2D((2,2)),

    tf.keras.layers.Conv2D(filters = 64,
                           kernel_size = (3,3),
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (25, 25, 64)),

    tf.keras.layers.GlobalAveragePooling2D(),

    tf.keras.layers.Dense(6, activation = 'softmax')
])

model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d (Conv2D)             (None, 200, 200, 32)      320       
                                                                 
 max_pooling2d (MaxPooling2  (None, 100, 100, 32)      0         
 D)                                                              
                                                                 
 conv2d_1 (Conv2D)           (None, 100, 100, 64)      18496     
                                                                 
 max_pooling2d_1 (MaxPoolin  (None, 50, 50, 64)        0         
 g2D)                                                            
                                                                 
 conv2d_2 (Conv2D)           (None, 50, 50, 64)        36928     
                                                                 
 max_pooling2d_2 (MaxPoolin  (None, 25, 25, 64)        0         
 g2D)                                                            
                                                                 
 conv2d_3 (Conv2D)           (None, 25, 25, 64)        36928     
                                                                 
 global_average_pooling2d (  (None, 64)                0         
 GlobalAveragePooling2D)                                         
                                                                 
 dense (Dense)               (None, 6)                 390       
                                                                 
=================================================================
Total params: 93062 (363.52 KB)
Trainable params: 93062 (363.52 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

model.compile(optimizer = 'adam',
              loss = 'sparse_categorical_crossentropy',
              metrics = 'accuracy')

model.fit(train_x, train_y, epochs = 15)

Epoch 1/15
47/47 [==============================] - 10s 68ms/step - loss: 1.7321 - accuracy: 0.2100
Epoch 2/15
47/47 [==============================] - 2s 45ms/step - loss: 1.1566 - accuracy: 0.5353
Epoch 3/15
47/47 [==============================] - 2s 38ms/step - loss: 0.6805 - accuracy: 0.7527
Epoch 4/15
47/47 [==============================] - 2s 36ms/step - loss: 0.6311 - accuracy: 0.7533
Epoch 5/15
47/47 [==============================] - 2s 36ms/step - loss: 0.5192 - accuracy: 0.8047
Epoch 6/15
47/47 [==============================] - 2s 38ms/step - loss: 0.4316 - accuracy: 0.8380
Epoch 7/15
47/47 [==============================] - 2s 39ms/step - loss: 0.4138 - accuracy: 0.8447
Epoch 8/15
47/47 [==============================] - 2s 38ms/step - loss: 0.4769 - accuracy: 0.8280
Epoch 9/15
47/47 [==============================] - 2s 44ms/step - loss: 0.3368 - accuracy: 0.8813
Epoch 10/15
47/47 [==============================] - 2s 47ms/step - loss: 0.3436 - accuracy: 0.8740
Epoch 11/15
47/47 [==============================] - 2s 36ms/step - loss: 0.3365 - accuracy: 0.8787
Epoch 12/15
47/47 [==============================] - 2s 33ms/step - loss: 0.2938 - accuracy: 0.8913
Epoch 13/15
47/47 [==============================] - 2s 33ms/step - loss: 0.3180 - accuracy: 0.8847
Epoch 14/15
47/47 [==============================] - 2s 33ms/step - loss: 0.3724 - accuracy: 0.8633
Epoch 15/15
47/47 [==============================] - 2s 33ms/step - loss: 0.2569 - accuracy: 0.9080

<keras.src.callbacks.History at 0x7e41e206a500>

# accuracy test
test_loss, test_acc = model.evaluate(test_x, test_y)

10/10 [==============================] - 1s 41ms/step - loss: 0.2665 - accuracy: 0.9033

# get max pooling layer and fully connected layer
conv_layer = model.get_layer(index = 6)
fc_layer = model.layers[8].get_weights()[0]

# Class activation map
my_map = tf.matmul(conv_layer.output, fc_layer)
CAM = tf.keras.Model(inputs = model.inputs, outputs = my_map)

test_idx = [7]
test_image = test_x[test_idx]

pred = np.argmax(model.predict(test_image), axis = 1)
predCAM = CAM.predict(test_image)

attention = predCAM[:,:,:,pred]
attention = np.abs(np.reshape(attention,(25,25)))

resized_attention = cv2.resize(attention,
                               (200*5, 200*5),
                               interpolation = cv2.INTER_CUBIC)

resized_test_x = cv2.resize(test_image.reshape(200,200),
                            (200*5, 200*5),
                            interpolation = cv2.INTER_CUBIC)

plt.figure(figsize = (6, 9))
plt.subplot(3,2,1)
plt.imshow(test_x[test_idx].reshape(200,200), 'gray')
plt.axis('off')
plt.subplot(3,2,2)
plt.imshow(attention)
plt.axis('off')
plt.subplot(3,2,3)
plt.imshow(resized_test_x, 'gray')
plt.axis('off')
plt.subplot(3,2,4)
plt.imshow(resized_attention, 'jet', alpha = 0.5)
plt.axis('off')
plt.subplot(3,2,6)
plt.imshow(resized_test_x, 'gray')
plt.imshow(resized_attention, 'jet', alpha = 0.5)
plt.axis('off')
plt.show()

1/1 [==============================] - 0s 190ms/step
1/1 [==============================] - 0s 63ms/step

3.4. Grad-CAM: Gradient-weighted Class Activation Maps¶

Does not require a particular architecture (as long as we can differentiate)
Uses gradients to determine weighting of each feature map

CAM

$$\sum_k \omega^c_k f_k$$

Grad-CAM

$$ReLU \left(\sum_{k} \alpha^c_k f_k \right) \quad \text{where} \quad \alpha^c_k = \frac{1}{Z} \sum_{x,y} \frac{\partial z_c}{\partial f_k(x,y)}$$

model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(filters = 32,
                           kernel_size = (3,3),
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (200, 200, 1)),

    tf.keras.layers.MaxPool2D((2,2)),

    tf.keras.layers.Conv2D(filters = 64,
                           kernel_size = (3,3),
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (100, 100, 32)),

    tf.keras.layers.MaxPool2D((2,2)),

    tf.keras.layers.Conv2D(filters = 64,
                           kernel_size = (3,3),
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (50, 50, 64)),

    tf.keras.layers.MaxPool2D((2,2)),

    tf.keras.layers.Conv2D(filters = 64,
                           kernel_size = (3,3),
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (25, 25, 64)),

    tf.keras.layers.MaxPool2D((2,2)),

    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(units = 64, activation = 'relu'),
    tf.keras.layers.Dense(units = 6, activation = 'softmax')
])

model.compile(optimizer = 'adam',
              loss = 'sparse_categorical_crossentropy',
              metrics = 'accuracy')

model.fit(train_x, train_y, epochs = 15)

Epoch 1/15
47/47 [==============================] - 4s 33ms/step - loss: 1.7092 - accuracy: 0.2200
Epoch 2/15
47/47 [==============================] - 2s 32ms/step - loss: 1.1546 - accuracy: 0.5287
Epoch 3/15
47/47 [==============================] - 2s 34ms/step - loss: 0.7600 - accuracy: 0.7233
Epoch 4/15
47/47 [==============================] - 2s 33ms/step - loss: 0.5364 - accuracy: 0.8067
Epoch 5/15
47/47 [==============================] - 2s 33ms/step - loss: 0.2964 - accuracy: 0.9007
Epoch 6/15
47/47 [==============================] - 2s 32ms/step - loss: 0.2367 - accuracy: 0.9127
Epoch 7/15
47/47 [==============================] - 2s 32ms/step - loss: 0.1665 - accuracy: 0.9420
Epoch 8/15
47/47 [==============================] - 1s 32ms/step - loss: 0.1637 - accuracy: 0.9487
Epoch 9/15
47/47 [==============================] - 2s 32ms/step - loss: 0.2170 - accuracy: 0.9140
Epoch 10/15
47/47 [==============================] - 2s 32ms/step - loss: 0.1179 - accuracy: 0.9607
Epoch 11/15
47/47 [==============================] - 2s 32ms/step - loss: 0.1644 - accuracy: 0.9387
Epoch 12/15
47/47 [==============================] - 2s 33ms/step - loss: 0.1116 - accuracy: 0.9653
Epoch 13/15
47/47 [==============================] - 2s 34ms/step - loss: 0.1251 - accuracy: 0.9593
Epoch 14/15
47/47 [==============================] - 2s 33ms/step - loss: 0.4703 - accuracy: 0.8693
Epoch 15/15
47/47 [==============================] - 1s 32ms/step - loss: 0.1523 - accuracy: 0.9513

<keras.src.callbacks.History at 0x7e4195b05510>

model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d_4 (Conv2D)           (None, 200, 200, 32)      320       
                                                                 
 max_pooling2d_3 (MaxPoolin  (None, 100, 100, 32)      0         
 g2D)                                                            
                                                                 
 conv2d_5 (Conv2D)           (None, 100, 100, 64)      18496     
                                                                 
 max_pooling2d_4 (MaxPoolin  (None, 50, 50, 64)        0         
 g2D)                                                            
                                                                 
 conv2d_6 (Conv2D)           (None, 50, 50, 64)        36928     
                                                                 
 max_pooling2d_5 (MaxPoolin  (None, 25, 25, 64)        0         
 g2D)                                                            
                                                                 
 conv2d_7 (Conv2D)           (None, 25, 25, 64)        36928     
                                                                 
 max_pooling2d_6 (MaxPoolin  (None, 12, 12, 64)        0         
 g2D)                                                            
                                                                 
 flatten (Flatten)           (None, 9216)              0         
                                                                 
 dense_1 (Dense)             (None, 64)                589888    
                                                                 
 dense_2 (Dense)             (None, 6)                 390       
                                                                 
=================================================================
Total params: 682950 (2.61 MB)
Trainable params: 682950 (2.61 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

test_idx = [7]
test_image = test_x[test_idx]

conv_layer = model.get_layer(index = 6)
grad_model = tf.keras.models.Model(inputs = model.inputs, outputs = [conv_layer.output, model.output])

with tf.GradientTape() as tape:
    desired_conv_layer_output, preds = grad_model(test_image)

    pred_index = tf.argmax(preds[0])
    class_channel = preds[:, pred_index]

# compute gradient via tensorflow GradientTape()
grads = tape.gradient(class_channel, desired_conv_layer_output)

pooled_grads = tf.reduce_mean(grads, axis = (0, 1, 2))

heatmap = tf.matmul(desired_conv_layer_output[0], pooled_grads[..., tf.newaxis])
heatmap = tf.squeeze(heatmap)

attention_grad = np.abs(np.reshape(heatmap,(25,25)))

resized_attention_grad = cv2.resize(attention_grad,
                                    (200*5, 200*5),
                                    interpolation = cv2.INTER_CUBIC)

resized_test_x = cv2.resize(test_image.reshape(200,200),
                            (200*5, 200*5),
                            interpolation = cv2.INTER_CUBIC)

plt.figure(figsize = (6, 9))
plt.subplot(3,2,1)
plt.imshow(test_x[test_idx].reshape(200,200), 'gray')
plt.axis('off')
plt.subplot(3,2,2)
plt.imshow(attention_grad)
plt.axis('off')
plt.subplot(3,2,3)
plt.imshow(resized_test_x, 'gray')
plt.axis('off')
plt.subplot(3,2,4)
plt.imshow(resized_attention_grad, 'jet', alpha = 0.5)
plt.axis('off')
plt.subplot(3,2,6)
plt.imshow(resized_test_x, 'gray')
plt.imshow(resized_attention_grad, 'jet', alpha = 0.5)
plt.axis('off')
plt.show()

%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')

	Fill time (sec)	Melt temperature (℃)	Mold temperature (℃)	Packing pressure (MPa)	Packing time (sec)	OverallProjectionAreaZX	MinGateHydraulicDiameter	CavityVolume	StdFlowLength	RunnerSurfaceArea	...	NumberOfCavities	AvgPartThickness	AvgGateHydraulicDiameter	Overall_Compactness	Cavity_Compactness	Runner_Compactness	CavitySurfaceToVolume	RunnerSurfaceToVolume	OverallSurfaceToVolume	Weight (g)
MoldNumber
1	0.7	232	31	43	2.7	19.686	0.55	14.804	0.601	16.797	...	2	1.253	0.813	1.398630	1.323422	2.400429	15.112335	8.331845	14.299703	13.7819
1	0.5	220	40	51	1.3	19.686	0.55	14.804	0.601	16.797	...	2	1.253	0.813	1.398630	1.323422	2.400429	15.112335	8.331845	14.299703	13.1317
1	1.4	221	51	46	1.3	19.686	0.55	14.804	0.601	16.797	...	2	1.253	0.813	1.398630	1.323422	2.400429	15.112335	8.331845	14.299703	13.3700
1	0.8	234	34	61	2.0	19.686	0.55	14.804	0.601	16.797	...	2	1.253	0.813	1.398630	1.323422	2.400429	15.112335	8.331845	14.299703	13.5909
1	1.1	239	44	37	2.6	19.686	0.55	14.804	0.601	16.797	...	2	1.253	0.813	1.398630	1.323422	2.400429	15.112335	8.331845	14.299703	13.7390
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
35	1.5	225	30	70	2.0	77.579	0.98	14.441	0.259	39.768	...	2	1.154	1.011	1.352616	1.188687	2.354154	16.825289	8.495621	14.786163	16.1268
35	1.5	225	30	70	3.0	77.579	0.98	14.441	0.259	39.768	...	2	1.154	1.011	1.352616	1.188687	2.354154	16.825289	8.495621	14.786163	16.4245
35	1.5	240	45	30	1.0	77.579	0.98	14.441	0.259	39.768	...	2	1.154	1.011	1.352616	1.188687	2.354154	16.825289	8.495621	14.786163	15.4136
35	1.5	240	45	30	2.0	77.579	0.98	14.441	0.259	39.768	...	2	1.154	1.011	1.352616	1.188687	2.354154	16.825289	8.495621	14.786163	15.8083
35	1.5	240	45	30	3.0	77.579	0.98	14.441	0.259	39.768	...	2	1.154	1.011	1.352616	1.188687	2.354154	16.825289	8.495621	14.786163	16.1344