Reading memo for GRL

* Book: GRL (Graph Representation Learning) by W.L. Hamilton (2020)

General points for thorough reading (not extensive reading)

Read every words and mark/memorize important words.
Try to understand every piece the authors want to tell us - if needed, read other reference(s).
However, if you cannot understand something after you have tried more than 30 minutes, discuss it with others.
Try to use low-entropy description in explaining.

#01, liangz

range: from the title page to p8 (end of Chapter 1).

NOTICE: please read both the scanned pictures and this memo.

i: title page

official book
author William L. Hamilton was an Assistant Professor at that time at McGill Univ.
McGill Univ., a famous university in Canada, 12 Nobel laureates, and graduate Yoshua Bengio (2018 ACM Turing Award, one of the three Godfathers of deep learning)

ii: Abstract page

no-free-lunch theorem
⇒ inductive bias is important before optimization (and before machine learning). Ex. Consider to use a linear regression to analyze a dataset. The assumption that a linear regression is approporiate is an inductive bias.
What is induction?
Lin & Tegmark'16 Why does deep and cheap learning work so well? ⇒ Assumptions: (1) Low polynomial order in the real world (2) Locality of data (3) Symmetry phenomenon.
3 graph learning topics: (1) embedding (2) CNN → Graph (3) Message-passing approach

iii-v: Contents

vi: Preface

“past seven years” ⇒ graph learning started from 2013
at 2020, fastest growing sub-areas of deep learning
audience: shall have some background in machine learning and deep learning (e.g., Goodfellow et al, 2016)

vii-viii: Acknowledgements

connections of the author (e.g., Jure Leskovec is a famous researcher in network science)

p1: Chapter 1 Introduction

p2:

It mentioned “a dramatic increase in the quantity and quality of graph data in the last 25 years.” ⇒ Why 25 years? (hint: It means since 1995)
ML is not the only way but may be interesting.
adjacency matrix, adjacency list, simple graph and {0,1}
(optional) graph processing from adjacency list to adjacency matrix seems an evidence showing human consumes matter and energy to increase its order.

p3:

multi-graph (variable number of types/relations) vs multi-relational graph (fixed number of types/relations)
Heterogeneous graph (inner edges important than inter edges), multipartite graph (no inner edges, only inter edges), multiplex graph (inter edges important than inner edges)
attribute or feature

p4:

graph (abstract structure) and network (real-world data)
supervised (predict an output) and unsupervised (infer pattern)
node classification: predict the label of a node given a small number of labelled nodes (|V_train| « |V|)

p5:

applications: bot detection in a social network, function of proteins in the interactome, classify the topic based on links, etc
difference from a standard supervised learning: the assumption/bias of iid (independent and identically distributed) or no.
popular inductive bias used in graph learning: homophily (same attrubute with neighbors), structural equivalence (similar local structure → similar label), heterophily (e.g., gender).

p6:

supervised learning and semi-supervised learning, and GL (no iid assumption)
relation prediction: e.g., recommendation system, side-effect. Notice the requirement of inductive bias.

p7:

clustering and community detection
graph classification, regression, and clustering (to the audience: what is the general difference?)

p8:

iid assumption and why? → Li-Yang
Additional comment: Causal relation and correlation. ML is often consider the latter approach but actually we need to consider the former.