Class Activation Maps (CAM)


By Prof. Seungchul Lee
http://iai.postech.ac.kr/
Industrial AI Lab at POSTECH

Table of Contents

  • Attention

  • Visualizing and Understanding Convolutional Networks

1. CNN with a Fully Connected Layer

The conventional CNN can be conceptually divided into two parts. One part is feature extraction and the other is classification. In the feature extraction process, convolution is used to extract the features of the input data so that the classification can be performed well. The classification process classifies which group each input data belongs to by using the extracted features from the input data.

When we visually identify images, we do not look at the whole image; instead, we intuitively focus on the most important parts of the image. CNN learning is similar to the way humans focus. When its weights are optimized, the more important parts are given higher weights. But generally, we are not able to recognize this because the generic CNN goes through a fully connected layer and makes the features extracted by the convolution layer more abstract.



1.1. Issues on CNN (or Deep Learning)

  • Deep learning performs well comparing with any other existing algorithms
  • But works as a black box

    • A classification result is simply returned without knowing how the classification results are derived → little interpretability
  • When we visually identify images, we do not look at the whole image

  • Instead, we intuitively focus on the most important parts of the image
  • When CNN weights are optimized, the more important parts are given higher weights

  • Class activation map (CAM)

    • We can determine which parts of the image the model is focusing on, based on the learned weights
    • Highlighting the importance of the image region to the prediction