* Book: GRL (Graph Representation Learning) by W.L. Hamilton (2020)
General points for thorough reading (not extensive reading)
Read every words and mark/memorize important words.
Try to understand every piece the authors want to tell us - if needed, read other reference(s).
However, if you cannot understand something after you have tried more than 30 minutes, discuss it with others.
Try to use low-entropy description in explaining.
range: from the title page to p8 (end of Chapter 1).
NOTICE: please read both the scanned pictures and this memo.
i: title page
official book
author William L. Hamilton was an Assistant Professor at that time at McGill Univ.
McGill Univ., a famous university in Canada, 12 Nobel laureates, and graduate Yoshua Bengio (2018 ACM Turing Award, one of the three Godfathers of deep learning)
ii: Abstract page
no-free-lunch theorem
⇒ inductive bias is important before optimization (and before machine learning). Ex. Consider to use a linear regression to analyze a dataset. The assumption that a linear regression is approporiate is an inductive bias.
What is induction?
Lin & Tegmark'16 Why does deep and cheap learning work so well? ⇒ Assumptions: (1) Low polynomial order in the real world (2) Locality of data (3) Symmetry phenomenon.
3 graph learning topics: (1) embedding (2) CNN → Graph (3) Message-passing approach
iii-v: Contents
vi: Preface
“past seven years” ⇒ graph learning started from 2013
at 2020, fastest growing sub-areas of deep learning
audience: shall have some background in machine learning and deep learning (e.g., Goodfellow et al, 2016)
vii-viii: Acknowledgements
p1: Chapter 1 Introduction
node, edge, relations, graph ⇒ ask the audience to illustrate some graphs
Zachary Karate Club Network (1977) and on its importance
p2:
It mentioned “a dramatic increase in the quantity and quality of graph data in the last 25 years.” ⇒ Why 25 years? (hint: It means since 1995)
ML is not the only way but may be interesting.
adjacency matrix, adjacency list, simple graph and {0,1}
(optional) graph processing from adjacency list to adjacency matrix seems an evidence showing human consumes matter and energy to increase its order.
p3:
multi-graph (variable number of types/relations) vs multi-relational graph (fixed number of types/relations)
Heterogeneous graph (inner edges important than inter edges), multipartite graph (no inner edges, only inter edges), multiplex graph (inter edges important than inner edges)
attribute or feature
p4:
graph (abstract structure) and network (real-world data)
supervised (predict an output) and unsupervised (infer pattern)
node classification: predict the label of a node given a small number of labelled nodes (|V_train| « |V|)
p5:
applications: bot detection in a social network, function of proteins in the interactome, classify the topic based on links, etc
difference from a standard supervised learning: the assumption/bias of iid (independent and identically distributed) or no.
popular inductive bias used in graph learning: homophily (same attrubute with neighbors), structural equivalence (similar local structure → similar label), heterophily (e.g., gender).
p6:
supervised learning and semi-supervised learning, and GL (no iid assumption)
relation prediction: e.g., recommendation system, side-effect. Notice the requirement of inductive bias.
p7:
clustering and community detection
graph classification, regression, and clustering (to the audience: what is the general difference?)
p8: