Graph Neural Networks (GNN)
Table of Contents
Vertex | Edge | |
---|---|---|
People | like each other | undirected |
People | is the boss of | directed |
Tasks | cannot be processed at the same time | undirected |
Computers | have a direct network connection | undirected |
Airports | planes flies between them | directed |
City | can travel between them | directed |
Undirected Graph vs. Directed Graph
Weighted Graph
Graph and Adjacency Matrix
# !pip install networkx
import networkx as nx
import matplotlib.pyplot as plt
%matplotlib inline
Graph.add_edge
g = nx.Graph()
g.add_edge('a', 'b')
g.add_edge('b', 'c')
g.add_edge('a', 'c')
g.add_edge('c', 'd')
# draw a graph with nodes and edges
nx.draw(g)
plt.show()
# draw a graph with node labels
pos = nx.spring_layout(g)
nx.draw(g, pos, node_size = 500)
nx.draw_networkx_labels(g, pos, font_size = 10)
plt.show()
Graph.add_nodes_from
Graph.add_edges_from
G = nx.Graph()
G.add_nodes_from([1, 2, 3, 4])
G.add_edges_from([(1,2), (1,3), (2,3), (3,4)])
# plot a graph
pos = nx.spring_layout(G)
nx.draw(G, pos, node_size = 500)
nx.draw_networkx_labels(G, pos, font_size = 10)
plt.show()
print(nx.number_of_nodes(G))
print(nx.number_of_edges(G))
print(G.nodes())
print(G.edges())
A = nx.adjacency_matrix(G)
print(A)
print(A.todense())
Degree of Undirected Graph
$$ d_i = \sum_{j=1}^{n} \; A_{ij} $$
$$D = \text{diag}\{d_1, d_2, \cdots \}$$
Some nodes have many edges, but some don't
Adding $I$ is to add self-connecting edges
Considering neighboring nodes in the normalized weights
To prevent numerical instabilities and vanishing/exploding gradients in order for the model to converge
1) (First attempt) Normalized $\tilde A$
$$\tilde A = \tilde D^{-1}(A+I)$$2) Normalized $\tilde A$
$$\tilde A = \tilde D^{-1/2}(A+I) \tilde D^{-1/2}$$import numpy as np
A = np.array([[0,1,1,1],
[1,0,0,0],
[1,0,0,1],
[1,0,1,0]])
A_self = A + np.eye(4)
print(A_self)
D = np.array(A_self.sum(1)).flatten()
D = np.diag(D)
print(D)
1) (First attempt) Normalized $\tilde A$
$$\tilde A = \tilde D^{-1}(A+I)$$A_norm = np.linalg.inv(D).dot(A_self)
print(A_norm)
2) Normalized $\tilde A$
$$\tilde A = \tilde D^{-1/2}(A+I) \tilde D^{-1/2}$$Now it is symmetric.
(Skip the details)
from scipy.linalg import fractional_matrix_power
D_half_norm = fractional_matrix_power(D, -0.5)
print(D_half_norm)
A_self = np.asmatrix(A_self)
D_half_norm = np.asmatrix(D_half_norm)
A_half_norm = D_half_norm*A_self*D_half_norm
print(A_half_norm)
Convolution Layer
Weight Sharing
1) Message Aggregation from Local Neighborhood
2) Update
Adding a non-linear function: $k^{\text{th}}$ layer
$$ \begin{align*} H^{(k+1)} &= f \left( A, H^{(k)} \right) \\ & = \sigma \left( A H^{(k)} \, W \right) \end{align*} $$1) Message Passing with Self-Loops
2) Neighborhood Normalization
Finally Graph Convolutional Networks
import networkx as nx
import matplotlib.pyplot as plt
%matplotlib inline
G = nx.Graph()
G.add_nodes_from([1, 2, 3, 4, 5, 6])
G.add_edges_from([(1, 2), (1, 3), (2, 3), (1, 4), (4, 5), (4, 6), (5, 6)])
nx.draw(G, with_labels = True, node_size = 600, font_size = 22)
plt.show()
A = nx.adjacency_matrix(G).todense()
print(A)
Assign feature vector $H$, so that it can be separated into two groups
H = np.matrix([1,0,0,-1,0,0]).T
print(H)
Product of adjacency matrix and node features matrix represents the sum of neighboring node features
A*H
A_self = A + np.eye(6)
A_self*H
Similar to data pre-processing for any neural networks operation, normalize the features to prevent numerical instabilities and vanishing/exploding gradients in order for the model to converge
D = np.array(A_self.sum(1)).flatten()
D = np.diag(D)
D_half_norm = fractional_matrix_power(D, -0.5)
A_self = np.asmatrix(A_self)
D_half_norm = np.asmatrix(D_half_norm)
A_half_norm = D_half_norm*A_self*D_half_norm
A_half_norm*H
Build 2-layer GCN using ReLU as the activation function
$$ \begin{align*} H^{(2)} &= \text{ReLU} \left( \tilde A H^{(1)} \, W^{(1)}\right) \\ H^{(3)} &= \text{ReLU} \left( \tilde A H^{(2)} \, W^{(2)}\right) \end{align*} $$np.random.seed(20)
W1 = np.random.randn(1, 4) # input: 1 -> hidden: 4
W2 = np.random.randn(4, 2) # hidden: 4 -> output: 2
def relu(x):
return np.maximum(0, x)
def gcn(A_self, H, W):
D = np.diag(np.array(A_self.sum(1)).flatten())
D_half_norm = fractional_matrix_power(D, -0.5)
H_new = D_half_norm*A_self*D_half_norm*H*W
return relu(H_new)
H1 = H
H2 = gcn(A_self, H1, W1)
H3 = gcn(A_self, H2, W2)
print(H3)
Adjacency matrix can be different even though two graph has the same network structure
Therefore, in graph-level representation,Readout layer makes this permutation invariant by multiplying MLP
Task 1: Node classification
Task 2: Edges prediction
Task 3: Graph classification
# !pip install spektral==0.6.0
# !pip install tensorflow==2.2.0
# !pip install keras==2.3.0
import numpy as np
import networkx as nx
import tensorflow as tf
import matplotlib.pyplot as plt
import spektral
Download data from here
CORA dataset
This dataset is the MNIST equivalent in graph learning
The CORA dataset consists of 2708 scientific publications classified into one of seven classes.
The citation network consists of 5429 links.
Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary.
The dictionary consists of 1433 unique words.
nodes = np.load('./data_files/cora_nodes.npy')
edge_list = np.load('./data_files/cora_edges.npy')
labels_encoded = np.load('./data_files/cora_labels_encoded.npy')
H = np.load('./data_files/cora_features.npy')
data_mask = np.load('./data_files/cora_mask.npy')
N = H.shape[0]
F = H.shape[1]
print('H shape: ', H.shape)
print('The number of nodes (N): ', N)
print('The number of features (F) of each node: ', F)
num_classes = 7
print('The number of classes: ', num_classes)
We split train and test dataset in ratio of 7:3, so the total number of dataset is 1895 for training, and 813 for testing.
# index of node for train model
train_mask = data_mask[0]
# index of node for test model
test_mask = data_mask[1]
print("The number of trainig data: ", np.sum(train_mask))
print("The number of test data: ", np.sum(test_mask))
G = nx.Graph(name = 'Cora')
G.add_nodes_from(nodes)
G.add_edges_from(edge_list)
print('Graph info: ', nx.info(G))
from scipy.linalg import fractional_matrix_power
A = nx.adjacency_matrix(G)
I = np.eye(A.shape[-1])
A_self = A + I
D = np.diag(np.array(A_self.sum(1)).flatten())
D_half_norm = fractional_matrix_power(D, -0.5)
A_half_norm = D_half_norm * A_self * D_half_norm
A_half_norm = np.array(A_half_norm)
H = np.array(H)
H_in = tf.keras.layers.Input(shape = (F, ))
A_in = tf.keras.layers.Input(shape = (N, ))
graph_conv_1 = spektral.layers.GraphConv(channels = 16,
activation = 'relu')([H_in, A_in])
graph_conv_2 = spektral.layers.GraphConv(channels = 7,
activation = 'softmax')([graph_conv_1, A_in])
model = tf.keras.models.Model(inputs = [H_in, A_in], outputs = graph_conv_2)
model.compile(optimizer = tf.keras.optimizers.Adam(learning_rate = 1e-2),
loss = 'categorical_crossentropy',
weighted_metrics = ['acc'])
model.summary()
model.fit([H, A_half_norm],
labels_encoded,
sample_weight = train_mask,
epochs = 30,
batch_size = N,
shuffle = False)
y_pred = model.evaluate([H, A_half_norm],
labels_encoded,
sample_weight = test_mask,
batch_size = N)
from sklearn.manifold import TSNE
layer_outputs = [layer.output for layer in model.layers]
activation_model = tf.keras.models.Model(inputs = model.input, outputs = layer_outputs)
activations = activation_model.predict([H,A_half_norm],batch_size = N)
x_tsne = TSNE(n_components = 2).fit_transform(activations[2])
def plot_tSNE(labels_encoded,x_tsne):
color_map = np.argmax(labels_encoded, axis = 1)
plt.figure(figsize = (10,10))
for cl in range(num_classes):
indices = np.where(color_map == cl)
indices = indices[0]
plt.scatter(x_tsne[indices,0], x_tsne[indices, 1], label = cl)
plt.legend()
plt.show()
plot_tSNE(labels_encoded,x_tsne)
%%html
<center><iframe src="https://www.youtube.com/embed/fOctJB4kVlM?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
%%html
<center><iframe src="https://www.youtube.com/embed/ABCGCf8cJOE?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
%%html
<center><iframe src="https://www.youtube.com/embed/0YLZXjMHA-8?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
%%html
<center><iframe src="https://www.youtube.com/embed/ex2qllcVneY?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
%%html
<center><iframe src="https://www.youtube.com/embed/YL1jGgcY78U?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
%%html
<center><iframe src="https://www.youtube.com/embed/8owQBFAHw7E?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
%%html
<center><iframe src="https://www.youtube.com/embed/R67-JxtOQzg?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
University cources:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')