[UDL Study Notes] Chapter 4 - Deep neural networks

Jzahnny
August 10, 2025

[UDL Study Notes] Chapter 4 - Deep neural networks

Use Original Cover Image
Type
Post
Children
Language
en
Tags
Deep Learning
UDL
Deep Neural Networks
Hidden Unit
Hidden Layer
Bias
Hyper Parameter
Weight
Region
Authors
Jzahnny
Published
August 10, 2025

Overview

This posting series is a study note that records the process of learning the book “Unerstanding Deep Learning”. This time, it covers Chapter 4, Deep neural networks.

1. The relationship between two layers is a composite function relationship.

I had only known that the calculation through the hidden unit in each layer was simply propagated, but after looking at it in detail with mathematical formulas, I could see that it was a composite function relationship that I had often encountered in high school. When drawing the output graph according to the input, I could easily understand it by thinking according to the principle of composite functions.
 

2. Familiarity of General Formulation

While looking at this equation, I was confused whether each of had its own meaning, but I understood it better when I saw the generalized equation below. In fact, each of did not have its own meaning, but they all just represented parameters.

3. Size of the Weight Matrix

In the generalization formula right above, there is a content that if the $k$-th layer has hidden units, then has a size of $D_{k+1}times D_k$. I didn't understand this well at first, but when I thought about it, it was natural. The reason is that is multiplied by $mathbf{h}{K}$, and to perform matrix multiplication at this time, the size of the matrix itself must have rows, and the hidden unit obtained by multiplying through this is $D{k+1}$, so it must have columns.
 
 
 

4. Linear region of 2 hidden units

In Problem 4.9, there is a problem like this. The question was whether a single shallow network with two hidden units could have three linear regions as shown in the figure below. First of all, it is true that it has three linear regions because of the formula of the hidden unit and the output itself, but the question was whether a linear region that oscillates between 0 and 1 like the one below is possible. The answer sheet says it's impossible, but the reason is not clearly stated.
notion image
 
By manipulating it with the Interactive Figure provided in the book, I could see why.
 
As shown in the figure below, when there are the first hidden unit and the second hidden unit, it can be seen that the actual significant slope is used only in the active part of each unit. To oscillate between 0 and 1, you need 2 + slopes and 1 - slope, but since there are 2 hidden units, you can only make 2 slopes, and the other one is unconditionally 0.
 
notion image
 
To be more precise, if there are 2 hidden units, there are only 2 joints, so the x-axis area can be divided into 3 areas. Like [0, j1], [j1, j2], and [j2, 1]. As a result, one of the areas must have an area where all units are inactive. If it is inactive, there is no slope, so it cannot make an oscillating form.
 
On the other hand, if the number of units is 3 or more, at least one unit is active in all x-axis areas in any case, so you can freely define the slope in all sections.

Reference

[1] Prince, S. J. D. (2023). Understanding Deep Learning. The MIT Press. Retrieved from http://udlbook.com