[UDL Study Notes] Chapter 14 - Unsupervised learning

Jzahnny
September 14, 2025

[UDL Study Notes] Chapter 14 - Unsupervised learning

Use Original Cover Image
Use Original Cover Image
Type
Post
Children
Language
en
Tags
Deep Learning
UDL
Unsupervised Learning
Kullback-Leibler Divergence
Inception Score
Probability Distribution
Authors
Jzahnny
Published
September 14, 2025

Overview

This posting series is a study note that records the process of learning the book “Understanding Deep Learning”.
This time, we will cover Chapter 14, Unsupervised learning.
 
This chapter is very short as it only contains an overview of Unsupervised learning. In the upcoming chapters 15, 16, 17, and 18, we will cover GAN, Normalizing flows, VAE, and Diffusion, respectively. The commonality of all Unsupervised learning is that they learn only with input data without the correct answer y label, but the details of how they learn are different.

Inception Score

Meaning of and

The rest was not very difficult as it was already roughly covered in Chapter 1. However, there was something I didn't understand in the Inception Score part. IS is a performance measurement indicator that evaluates whether the image created by the image generative model
1) can be well distinguished into each class
2) can be generated uniformly for all classes
for the images in the ImageNet database.
 
Specifically, it can be defined by the following equation.
 
At this time, is as follows. is the number of generated examples.
Also, is each generated image.
 
What I didn't understand well at first was what and each meant.
 
notion image
There was Figure 14.4 like this, but what was confusing was whether it represented a certain probability value or a probability distribution. In fact, I already learned in Chapter 5 that it is approached as a probability distribution, but there was still a bit of confusion.
 
But it's very simple once you know it.
 
For example, let's say is a plane image like the first picture in Figure 14.4. Then is a probability distribution containing the probabilities for each class when a plane is given. So, it can be an actual probability value like = 0.9.
 
And since is the average of for all generated examples, it can be said how much each class came out relative to other classes in the probability distribution.
 
 
 

Why use KL-divergence?

Returning to the formula,
I understood what and are, but I didn't understand why Kullback-Leibler divergence is used. Conceptually, I only knew that KLD calculates the distance between two probability distributions. However, from this point of view, the meaning of each probability distribution is quite different, so I couldn't guess what the distance between them meant. The book says that according to this formula, the probability that each image corresponds to each of the 1000 classes is high, and the more uniformly each generated image comes out, the higher the IS, but I didn't understand why. In the end, according to the above equation, both Sample quality and Coverage must be high for the score to be high, but I didn't understand why.
 
To understand this, I had to look closely at the principle of KLD, not just a simple conceptual approach of the distance between probability distributions.
 
KLD looks like the above. Here, if we substitute and for and respectively, it becomes as follows.
If you look at how to make larger, you can see that the larger is and the smaller is, the larger it becomes. Therefore, it is to induce the probability for each class to be high while being uniform for all classes.

Reference

[1] Prince, S. J. D. (2023). Understanding Deep Learning. The MIT Press. Retrieved from http://udlbook.com