Overview
This series of posts are study notes documenting my progress through the book "Unerstanding Deep Learning".
This post covers Chapter 20, Why does deep learning work?
Shouldn't deep learning work in theory?
In theory, even shallow neural networks can produce functions that are free enough within a given space. They can also perform well enough with far fewer parameters than the number of train data.
Nevertheless, deeper is usually better. Overparameterization is said to make both training and generalization much better. The book still doesn't make it clear why this is the case.
My best guess is that deep NNs can express much more diverse outputs with the same number of parameters than shallow NNs. I wonder if it's some kind of increased dimensionality, like shallow is drawing an NN on 2 dimensions, while DNN is drawing on 3 dimensions? The other question I have is, is there another dimensionality beyond wide and deep? If such a characteristic exists, wouldn't it allow us to increase the degree of representation even more than we have now?
References
[1] Prince, S. J. D. (2023). Understanding Deep Learning. The MIT Press. Retrieved from http://udlbook.com