Deep Learning with Pytorch in a Nutshell
  • Content
  • Image recognition
  • Object detection
  • Semantic segmentation
  • GAN
  • Image style transfer
  • Face recognition
  • Interpretability
  • Word embedding
  • Pytorch
  • Optimization
  • Special layers
    • Transposed convolution
  • Neural architecture search
  • Reinforcement learning
    • Proof of Bellman equation
    • Tabular solution method
Powered by GitBook
On this page
  • Adaptive Deconvolutional Networks for Mid and High Level Feature Learning
  • Explain what is deconvolution (transposed convolution, a learnable upsampling)
  • A guide to convolution arithmetic for deep learning
  1. Special layers

Transposed convolution

PreviousSpecial layersNextNeural architecture search

Last updated 6 years ago

Adaptive Deconvolutional Networks for Mid and High Level Feature Learning

Transposed convolution = Fractional strided convolution

Explain what is deconvolution (transposed convolution, a learnable upsampling)

Transposed convolution just recovers the shape of the origin image, but don't value.

Consider the first layer of model, the c-th channel of input (y1cy_1^cy1c​), it could be reconstructed by convolving the first layer output which containsK1K_1K1​channels (zk,1,k=1,...,K1z_{k,1}, k=1,...,K_1zk,1​,k=1,...,K1​) with the filter (fk,1cf^c_{k,1}fk,1c​).

y^1c=∑k=1K1zk,1∗fk,1c\hat{y}_1^c=\sum_{k=1}^{K_1} z_{k,1}*f_{k,1}^cy^​1c​=k=1∑K1​​zk,1​∗fk,1c​

And any convolution can be represented as matrix multiplication

y^1=F1z1\hat{y}_1 = F_1 z_1y^​1​=F1​z1​

A reconstruct operator RlR_lRl​ compose a sequence convolutional matrix and upsampling matrix, which y^l\hat{y}_ly^​l​ is the reconstructing image from the l layer feature map.

y^l=F1Us1F2Us2...Flzl=Rlzl\hat{y}_l=F_1U_{s1}F_2U_{s2}...F_lz_l=R_lz_ly^​l​=F1​Us1​F2​Us2​...Fl​zl​=Rl​zl​

Hence, the projection operator RlTR^T_lRlT​ maps the input image to zlz_lzl​

RlT=FlT...Ps2F2TPs1F1TR^T_l=F_l^T...P_{s2}F_2^TP_{s1}F_1^TRlT​=FlT​...Ps2​F2T​Ps1​F1T​

The projection operator is not in the sense of vector project. It more likes recover the shape to input space.

FTF^TFT and FFF are not the transposed relationship in the matrix meaning. The weights of these two operators are trained separately.

A guide to convolution arithmetic for deep learning

http://www.matthewzeiler.com/wp-content/uploads/2017/07/iccv2011.pdf
https://arxiv.org/pdf/1603.07285.pdf