Overview
This posting series is a study note that records the process of learning the book “Understanding Deep Learning”. This time, it covers Chapter 11, Residual networks.
1. Difference between ResNet and RNN
I thought Residual network was RNN, but after studying it, I found that although there are similarities, they are completely different. The reason they were created is different. ResNet was created to train deep networks more effectively, and RNN was created to handle time series data well. So, Residual block can also be applied to CNN. Looking at the brief principle of each, ResNet is implemented in a simple way that adds the previous value to the output as it is. On the other hand, in RNN, the cell state of LSTM also uses addition rather than multiplication, which is a similar part.
2. Why each value of the Branch is Uncorrelated

In problem 11.4, it was asked to explain why the two branch values in the residual block of the figure above are uncorrelated with each other. When I first saw the problem, I was confused because I didn't know what a branch was. It turned out that the flow passing through was one branch, and the flow added above it was another branch. And the reason these two branches are uncorrelated is simply that they become uncorrelated because they go through several stages, such as multiplying by a weight and passing through an activation in .
Conclusion
I had only heard of ResNet and DenseNet, but after learning the principles, I felt that there was not much to them. In particular, it is surprising that the loss surface becomes smooth just by simply adding values. Since our brain is also influenced by other neurons, it seems reasonable to make it residual rather than simply sequential. In particular, it is said to be becoming a standard in the deep learning pipeline, so I should definitely try it later.
Reference
[1] Prince, S. J. D. (2023). Understanding Deep Learning. The MIT Press. Retrieved from http://udlbook.com