Explain what is deconvolution (transposed convolution, a learnable upsampling)
Transposed convolution just recovers the shape of the origin image, but don't value.
Consider the first layer of model, the c-th channel of input (y1c), it could be reconstructed by convolving the first layer output which containsK1channels (zk,1,k=1,...,K1) with the filter (fk,1c).
y^1c=k=1∑K1zk,1∗fk,1c
And any convolution can be represented as matrix multiplication
y^1=F1z1
A reconstruct operator Rl compose a sequence convolutional matrix and upsampling matrix, which y^l is the reconstructing image from the l layer feature map.
y^l=F1Us1F2Us2...Flzl=Rlzl
Hence, the projection operator RlT maps the input image to zl
RlT=FlT...Ps2F2TPs1F1T
The projection operator is not in the sense of vector project. It more likes recover the shape to input space.
FT and F are not the transposed relationship in the matrix meaning. The weights of these two operators are trained separately.
A guide to convolution arithmetic for deep learning