[UDL Study Notes] Ch 13 - Graph neural networks

Overview

This post series is a study note that records the process of learning the book "Understanding Deep Learning".

This time, I will cover Chapter 13, Graph neural networks.

https://udlbook.github.io/udlbook/

1. Graph

As GNN is covered in Chapter 13, the chapter begins with a brief introduction to graphs.

I learned that a graph is not just a connection of nodes and edges, but an abstraction of all kinds of connections that can represent molecular structures, circuits, human relationships, and even 3D modeling.

Although this book only deals with undirected graphs composed of nodes and edges without direction, I was able to get a glimpse of the potential of deep learning by thinking about the possibility of applying it to other graph structures.

2. Adjacency matrix

An adjacency matrix was presented as a way to represent the connection between nodes in a graph.

When I first encountered this, I thought it would be more efficient to represent it as an array of (m,n) 2D tuples since it is a matrix composed of only 0s and 1s.

In fact, the book also suggested the possibility of representing it as an array of tuples, but my thoughts changed after seeing the characteristics of expressing it as a matrix.

By multiplying the adjacency matrix L times, you can find the nodes that can be reached from one node through L edges, and the adjacency matrix is required when calculating the next layer in GCN.

As such, there is a convenience in expressing the connection between nodes as an adjacency matrix, so I learned that an array of tuples is good, but an adjacency matrix is also good.

3. Transductive model

So far, we have only dealt with inductive models that learn a training set and then infer with a testing set.

However, here we introduce a transductive model that processes a single dataset in which labeled and unlabeled data are mixed, where the training set and testing set are not clearly distinguished.

If an inductive model learns the rules for mapping data and labels through multiple datasets, a transductive model, also called semi-supervised learning, can be seen as filling in the remaining unlabeled data in a large dataset where data is mixed.

Although the direct implementation method is not described, I thought that the possibility of the transductive model introduced here would be very large.

4. Graph attention network

The attention mechanism introduced in the transformer before can also be applied to the graph network.

The dot product self-attention mechanism in the Key-Query/Value format used in Transformer calculates the layer using the Softmax function and dot product.

In contrast, GCN calculates the layer using an activation function, bias, weight, and adjacency matrix.

The Graph attention network is a compromise between the two, applying bias and weight to graph data, and applying attention by calculating the similarity between two connected nodes.

And it uses a Softmask function, which is different from the Softmax function. Like Softmax, it normalizes to a value between 0 and 1, but the difference from Softmax is that nodes that are not connected by an edge are excluded from the calculation.

Seeing this, I felt that the attention mechanism used in the transformer is a truly versatile structure because it can be used not only for images but also for graph networks.

Reference

[1] Prince, S. J. D. (2023). Understanding Deep Learning. The MIT Press. Retrieved from http://udlbook.com