Interpretability

Visualizing and Understanding Convolutional Networks

https://cs.nyu.edu/~fergus/papers/zeilerECCV2014.pdf

Learning Deep Features for Discriminative Localization (Class activation mapping, CAM)

http://cnnlocalization.csail.mit.edu/Zhou_Learning_Deep_Features_CVPR_2016_paper.pdf

Class activation mapping (CAM)

  • A weakly supervised localization method.

  • If the last three layers are "convolution + global average pooling+ full connection), then you can apply this method to visualize object location.

  • Just use fc's weight and calculate weighted sum of convolution feature maps (without global average pooling).

Class activation mapping

Let a1,...,aka_1,..., a_k be the k feature maps of the last convolution layer and followed by GAP and FC layers, and MMis the size of a feature map, and wnw_n is the weight which connected n-th filter to one of the scores called "s".

s=n=1k(wnij(an)ijM)=1Mij(n=1kwnan)ijs = \sum_{n=1}^k (w_n \frac{\sum_{ij} (a_n)_{ij}}{M})=\frac{1}{M}\sum_{ij} (\sum_{n=1}^kw_na_n)_{ij}

It changed the layers' order from "conv->GAP->FC" to "conv->FC->GAP". The following simple equation is called the "class activation mapping" of class "s".

CAM=n=1kwnan\text{CAM}=\sum_{n=1}^kw_na_n

Last updated