Graph Neural Networks (GNN)

By Prof. Seungchul Lee
http://iailab.kaist.ac.kr/
Industrial AI Lab at KAIST

Table of Contents

1. Graph¶

abstract relations, topology, or connectivity

Graphs $G(V,E)$
- $V$: a set of vertices (nodes)
- $E$: a set of edges (links, relations)
- weight (edge property)
  - distance in a road network
  - strength of connection in a personal network

Graphs model any situation where you have objects and pairwise relations (symmetirc or asymmetirc) between the objects

Vertex	Edge
People	like each other	undirected
People	is the boss of	directed
Tasks	cannot be processed at the same time	undirected
Computers	have a direct network connection	undirected
Airports	planes flies between them	directed
City	can travel between them	directed

1.1. Type of Graph ¶

Undirected Graph vs. Directed Graph

Undirected graph
- Edges of undirected graph points both ways between nodes
- ex) Two-way road

Directed graph
- A graph in which the edges are directed
- ex) One-way raod

Weighted Graph

A graph with edges assigned costs or weights
Also called 'Network'
- ex) connection between cities, length of road, circuit element capacity, communication network usage fee, etc.

1.2. Graph Representation¶

Graph and Adjacency Matrix

Simple undirected graph consist of only nodes and edges
Graph can be represented as adjacency matrix $A$
- Adjacency matrix $A$ indicates adjacent nodes for each node

Need (number of nodes) $ \times$ (number of nodes) shape matirx to represent adjacency matirx of undirected graph
- Symmetirc matirx

1.3. Adjacent Matrix¶

Undirected graph $G = (V,E)$

$$ \begin{align*}V &= \{1,2,\cdots,7\} \\ E &= \{\{1,2\},\{1,6\},\{2,3\},\{3,4\},\{3,6\},\{3,7\},\{4,7\},\{5,6\} \} \end{align*} $$

$$\text{Adjacency list} = \begin{cases} \;\; \text{adj}(1) = \{2,6\}\\ \;\; \text{adj}(2) = \{1,3\}\\ \;\; \text{adj}(3) = \{2,4,6,7\}\\ \;\; \text{adj}(4) = \{3,7\}\\ \;\; \text{adj}(5) = \{6\}\\ \;\; \text{adj}(6) = \{1,3,5\}\\ \;\; \text{adj}(7) = \{3,4\} \end{cases}$$

$$ \text{Adjacency matrix (symmetric) } A = \begin{bmatrix} 0&1&0&0&0&1&0\\ 1&0&1&0&0&0&0\\ 0&1&0&1&0&1&1\\ 0&0&1&0&0&0&1\\ 0&0&0&0&0&1&0\\ 1&0&1&0&1&0&0\\ 0&0&1&1&0&0&0\\ \end{bmatrix}$$

Directed graph $G = (V,E)$

$$ \begin{align*} V &= \{1,2,\cdots,7\} \\ E &= \{\{1,2\},\{1,6\},\{2,3\},\{3,4\},\{3,7\},\{4,7\},\{6,3\},\{6,5\} \} \end{align*} $$

$$\text{Adjacency list} = \begin{cases} \;\; \text{adj}(1) &= \{2,6\}\\ \;\; \text{adj}(2) &= \{3\}\\ \;\; \text{adj}(3) &= \{4,7\}\\ \;\; \text{adj}(4) &= \{7\}\\ \;\; \text{adj}(5) &= \phi\\ \;\; \text{adj}(6) &= \{3,5\}\\ \;\; \text{adj}(7) &= \phi \end{cases}$$

$$ \text{Adjacency matrix (symmetric) } A = \begin{bmatrix} 0&1&0&0&0&1&0\\ 0&0&1&0&0&0&0\\ 0&0&0&1&0&0&1\\ 0&0&0&0&0&0&1\\ 0&0&0&0&0&0&0\\ 0&0&1&0&1&0&0\\ 0&0&0&0&0&0&0\\ \end{bmatrix}$$

# !pip install networkx

import networkx as nx
import matplotlib.pyplot as plt

%matplotlib inline

Graph.add_edge

g = nx.Graph()
g.add_edge('a', 'b')
g.add_edge('b', 'c')
g.add_edge('a', 'c')
g.add_edge('c', 'd')

# draw a graph with nodes and edges

nx.draw(g)
plt.show()

# draw a graph with node labels

pos = nx.spring_layout(g)

nx.draw(g, pos, node_size = 500)
nx.draw_networkx_labels(g, pos, font_size = 10)
plt.show()

Graph.add_nodes_from

Graph.add_edges_from

G = nx.Graph()

G.add_nodes_from([1, 2, 3, 4])
G.add_edges_from([(1,2), (1,3), (2,3), (3,4)])

# plot a graph
pos = nx.spring_layout(G)

nx.draw(G, pos, node_size = 500)
nx.draw_networkx_labels(G, pos, font_size = 10)
plt.show()

print(nx.number_of_nodes(G))
print(nx.number_of_edges(G))
print(G.nodes())
print(G.edges())

4
4
[1, 2, 3, 4]
[(1, 2), (1, 3), (2, 3), (3, 4)]

A = nx.adjacency_matrix(G)

print(A)
print(A.todense())

  (0, 1)	1
  (0, 2)	1
  (1, 0)	1
  (1, 2)	1
  (2, 0)	1
  (2, 1)	1
  (2, 3)	1
  (3, 2)	1
[[0 1 1 0]
 [1 0 1 0]
 [1 1 0 1]
 [0 0 1 0]]

1.4. Degree¶

Degree of Undirected Graph

the degree of vertex in a graph is the number of edges connected to it
denote the degree of vertex $i$ by $d_{i}$
for an undirected graph of $n$ vertices

$$ d_i = \sum_{j=1}^{n} \; A_{ij} $$

Degree matrix $D$ of adjacent matrix $A$

$$D = \text{diag}\{d_1, d_2, \cdots \}$$

Example

$$A = \begin{bmatrix} 0 & 1 & 1 & 1 \\ 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 1 \\ 1 & 0 & 1 & 0 \end{bmatrix} \qquad \Rightarrow \qquad D = \begin{bmatrix} 3 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 2 \end{bmatrix} $$

1.5. Self-connecting Edges¶

$$A = \begin{bmatrix} 0 & 1 & 1 & 1 \\ 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 1 \\ 1 & 0 & 1 & 0 \end{bmatrix} \qquad \Rightarrow \qquad A+I = \begin{bmatrix} 1 & 1 & 1 & 1 \\ 1 & 1 & 0 & 0 \\ 1 & 0 & 1 & 1 \\ 1 & 0 & 1 & 1 \end{bmatrix} \qquad \Rightarrow \qquad \tilde D = \begin{bmatrix} 4 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 \\ 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 3 \end{bmatrix} $$

1.6. Neighborhood Normalization¶

Some nodes have many edges, but some don't

Adding $I$ is to add self-connecting edges
Considering neighboring nodes in the normalized weights
To prevent numerical instabilities and vanishing/exploding gradients in order for the model to converge

1) (First attempt) Normalized $\tilde A$

$$\tilde A = \tilde D^{-1}(A+I)$$

2) Normalized $\tilde A$

$$\tilde A = \tilde D^{-1/2}(A+I) \tilde D^{-1/2}$$

import numpy as np

A = np.array([[0,1,1,1],
              [1,0,0,0],
              [1,0,0,1],
              [1,0,1,0]])

A_self = A + np.eye(4)

print(A_self)

[[1. 1. 1. 1.]
 [1. 1. 0. 0.]
 [1. 0. 1. 1.]
 [1. 0. 1. 1.]]

D = np.array(A_self.sum(1)).flatten()
D = np.diag(D)

print(D)

[[4. 0. 0. 0.]
 [0. 2. 0. 0.]
 [0. 0. 3. 0.]
 [0. 0. 0. 3.]]

1) (First attempt) Normalized $\tilde A$

$$\tilde A = \tilde D^{-1}(A+I)$$

It is not symmetric.

A_norm = np.linalg.inv(D).dot(A_self)

print(A_norm)

[[0.25       0.25       0.25       0.25      ]
 [0.5        0.5        0.         0.        ]
 [0.33333333 0.         0.33333333 0.33333333]
 [0.33333333 0.         0.33333333 0.33333333]]

2) Normalized $\tilde A$

$$\tilde A = \tilde D^{-1/2}(A+I) \tilde D^{-1/2}$$

Now it is symmetric.
(Skip the details)

from scipy.linalg import fractional_matrix_power

D_half_norm = fractional_matrix_power(D, -0.5)

print(D_half_norm)

[[0.5        0.         0.         0.        ]
 [0.         0.70710678 0.         0.        ]
 [0.         0.         0.57735027 0.        ]
 [0.         0.         0.         0.57735027]]

A_self = np.asmatrix(A_self)
D_half_norm = np.asmatrix(D_half_norm)

A_half_norm = D_half_norm*A_self*D_half_norm

print(A_half_norm)

[[0.25       0.35355339 0.28867513 0.28867513]
 [0.35355339 0.5        0.         0.        ]
 [0.28867513 0.         0.33333333 0.33333333]
 [0.28867513 0.         0.33333333 0.33333333]]

2. Graph Convolution Network (GCN)¶

2.1. Convolution¶

In previous CNN lecture CNN has two characteristics, preserving the spatial structure and weight sharing
To apply convolution in graph network, graph also has to conpensate that characteristics too

Convolution Layer

In CNN, convolution layer preserve the spatial structure of input
It convolve over all spatial locations
- Extract features for each convolution layer

Weight Sharing

Reduce the number of parameters by weight sharing
Within the same layer, the same filter will be used throughout image

2.2. Connection between CNN and GCN¶

GCNs perform similar operations where the model learns the features by inspecting neighboring nodes
The major difference between CNNs and GCNs is that CNNs are specially built to operate on regular (Euclidean) structured data, while GCNs operate for the graph data that the number of nodes connections vary and the nodes are unordered (irregular on non-Euclidean structured data)

2.3. Basics of GCN¶

Similar to CNN, GCN updates each node with their adjacent nodes
Unlike CNN, each node of GCN has different number of adjacent nodes
- Indicate adjacent nodes of each node by adjacency matrix A

Basic process (or terminology) of GCN
- Message: information passed by neighboring nodes to the central node
- Aggregate: collect information from neighboring nodes
- Update: embedding update by combining information from neighboring nodes and from itself

$$ \begin{align*} h_{u}^{(k+1)} &= \text{UPDATE} \left( \text{AGGREGATE} \left( \left\{ h_{v}^{(k)}, \forall v \in \mathcal{N}(u) \right\} \right) \right)\\ \end{align*} $$

1) Message Aggregation from Local Neighborhood

$$ \begin{align*} &\text{AGGREGATE} \left( \left\{ h_{v}^{(k)}, \forall v \in \mathcal{N}(u) \right\} \right)\\\\ &\Rightarrow AH^{(k)} \end{align*} $$

2) Update

Adding a non-linear function: $k^{\text{th}}$ layer

$$ \begin{align*} H^{(k+1)} &= f \left( A, H^{(k)} \right) \\ & = \sigma \left( A H^{(k)} \, W \right) \end{align*} $$

$$ \begin{align*} h_{u}^{(k+1)} &= \text{UPDATE} \left( \text{AGGREGATE} \left( \left\{ h_{v}^{(k)}, \forall v \in \mathcal{N}(u) \right\} \right) \right)\\\\ H^{(k+1)} &= \sigma \left(A H^{(k)} \, W_{\text{neigh}}^{(k)} \right) \end{align*} $$

$h_1^{(k)}$: feature matirx of first node in $k^{th}$ layer
$W^{(k)}$: weight of $k^{th}$ layer
- Weight sharing: share same weight for each layer
  - In the same layer, each node is updated similarly, so it shares the same weight
  - weight sharing enchance computing complexity and time

2.4. Further Improvements for GCN¶

1) Message Passing with Self-Loops

As a simplification of the neural message passing approach, it is common to add self-loops to the input graph and omit the explicit update step

$$ \begin{align*} h_{u}^{(k+1)} &= \text{UPDATE} \left( h_{u}^{(k)}, \text{AGGREGATE} \left( \left\{ h_{v}^{(k)}, \forall v \in \mathcal{N}(u) \right\} \right) \right) \\ &= \text{UPDATE} \left( \text{AGGREGATE} \left( \left\{ h_{v}^{(k)}, \forall v \in \mathcal{N}(u) \cup \{u \}\right\} \right) \right) \\ \\ H^{(k+1)} &= \sigma \left( \left(A+I \right)H^{(k)} \, W^{(k)}\right) \end{align*} $$

2) Neighborhood Normalization

The most basic neighborhood aggregation operation simply takes the sum of the neighbor embedding.
One issue with this approach is that it can be unstable and highly sensitive to node degrees.
One solution to this problem is to simply normalize the aggregation operation based upon the degrees of the nodes involved.
The simplest approach is to just take a weighted average rather than sum.

$$ \begin{align*} \tilde A &= D^{-1/2}AD^{-1/2} + I \\ & \approx \tilde D^{-1/2}(A+I) \tilde D^{-1/2} \qquad \text{where } \, \tilde D \, \text{ is the degree matrix of } A+I \end{align*} $$

Finally Graph Convolutional Networks

$$ \begin{align*} H^{(k+1)} &= \sigma \left(A H^{(k)} \, W^{(k)} \right) \\\\ &\Downarrow \\\\ H^{(k+1)} &= \sigma \left( \left(A+I \right)H^{(k)} \, W^{(k)}\right) \\\\ &\Downarrow \\\\ H^{(k+1)} &= \sigma \left( \left(\tilde D^{-1/2}(A+I)\tilde D^{-1/2} \right)H^{(k)} \, W^{(k)}\right)\\\\\\ \therefore H^{(k+1)} &= \sigma \left( \tilde A H^{(k)} \, W^{(k)}\right) \end{align*} $$

For each layer, the feature matrix and weight matrix are multiplied to create the next feature matrix

import networkx as nx
import matplotlib.pyplot as plt

%matplotlib inline

G = nx.Graph()

G.add_nodes_from([1, 2, 3, 4, 5, 6])
G.add_edges_from([(1, 2), (1, 3), (2, 3), (1, 4), (4, 5), (4, 6), (5, 6)])

nx.draw(G, with_labels = True, node_size = 600, font_size = 22)
plt.show()

A = nx.adjacency_matrix(G).todense()

print(A)

[[0 1 1 1 0 0]
 [1 0 1 0 0 0]
 [1 1 0 0 0 0]
 [1 0 0 0 1 1]
 [0 0 0 1 0 1]
 [0 0 0 1 1 0]]

Assign feature vector $H$, so that it can be separated into two groups

H = np.matrix([1,0,0,-1,0,0]).T

print(H)

[[ 1]
 [ 0]
 [ 0]
 [-1]
 [ 0]
 [ 0]]

Product of adjacency matrix and node features matrix represents the sum of neighboring node features

A*H

matrix([[-1],
        [ 1],
        [ 1],
        [ 1],
        [-1],
        [-1]])

A_self = A + np.eye(6)

A_self*H

matrix([[ 0.],
        [ 1.],
        [ 1.],
        [ 0.],
        [-1.],
        [-1.]])

Similar to data pre-processing for any neural networks operation, normalize the features to prevent numerical instabilities and vanishing/exploding gradients in order for the model to converge

D = np.array(A_self.sum(1)).flatten()
D = np.diag(D)

D_half_norm = fractional_matrix_power(D, -0.5)

A_self = np.asmatrix(A_self)
D_half_norm = np.asmatrix(D_half_norm)

A_half_norm = D_half_norm*A_self*D_half_norm

A_half_norm*H

matrix([[ 0.        ],
        [ 0.28867513],
        [ 0.28867513],
        [ 0.        ],
        [-0.28867513],
        [-0.28867513]])

Build 2-layer GCN using ReLU as the activation function

$$ \begin{align*} H^{(2)} &= \text{ReLU} \left( \tilde A H^{(1)} \, W^{(1)}\right) \\ H^{(3)} &= \text{ReLU} \left( \tilde A H^{(2)} \, W^{(2)}\right) \end{align*} $$

np.random.seed(20)

W1 = np.random.randn(1, 4) # input: 1 -> hidden: 4
W2 = np.random.randn(4, 2) # hidden: 4 -> output: 2

def relu(x):
    return np.maximum(0, x)

def gcn(A_self, H, W):
    D = np.diag(np.array(A_self.sum(1)).flatten())
    D_half_norm = fractional_matrix_power(D, -0.5)
    H_new = D_half_norm*A_self*D_half_norm*H*W
    return relu(H_new)

H1 = H
H2 = gcn(A_self, H1, W1)
H3 = gcn(A_self, H2, W2)

print(H3)

[[0.         0.07472825]
 [0.         0.08628875]
 [0.         0.08628875]
 [0.12632564 0.        ]
 [0.14586829 0.        ]
 [0.14586829 0.        ]]

2.5. Readout: Permutation Invariance¶

Adjacency matrix can be different even though two graph has the same network structure
- Even if the edge information between all nodes is the same, the order of values in the matrix may be different due to rotation and symmetry
Therefore, in graph-level representation,Readout layer makes this permutation invariant by multiplying MLP

Node-wise summation

$$ Z_G = \sigma \left(\sum_{i \in G} \text{MLP} \left(H_i^{(L)} \right) \right) $$

2.6. Overall Structure of GCN¶

Graph information with feature matrix and adjacency matrix input to GCN
Graph Convolution Layer
- Update information of each node according to their adjacency matrix

Collect all node information with MLP and determine a certain value for regression or classification in readout layer

2.7. Three Types of GNN Problem¶

Task 1: Node classification
Task 2: Edges prediction
Task 3: Graph classification

3. Lab 1: Node Classification using Graph Convolutional Networks¶

3.0. List of GNN Python Libraries¶

PyTorch Geometric
- PyG
- Built upon PyTorch

Deep Graph Library (DGL)
- Based on PyTorch, TensorFlow or Apache MXNet.

Graph Nets
- DeepMind’s library for building graph networks in Tensorflow and Sonnet

Spektral
- Based on the Keras API and TensorFlow 2
- We will use this one for demo

# !pip install spektral==1.3.0
# !pip install tensorflow==2.11.0
# !pip install keras==2.11.0

import numpy as np
import networkx as nx
import tensorflow as tf
import matplotlib.pyplot as plt

import spektral

3.1. Data Loading¶

Download data from here

CORA dataset

This dataset is the MNIST equivalent in graph learning
The CORA dataset consists of 2708 scientific publications classified into one of seven classes.
- Case_Based: 298
- Genetic_Algorithms: 418
- Neural_Networks: 818
- Probabilistic_Methods: 426
- Reinforcement_Learning: 217
- Rule_Learning: 180
- Theory: 351

The citation network consists of 5429 links.
Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary.
The dictionary consists of 1433 unique words.

from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive

nodes = np.load('/content/drive/MyDrive/DL_Colab/DL_data/cora_nodes.npy')
edge_list = np.load('/content/drive/MyDrive/DL_Colab/DL_data/cora_edges.npy')

labels_encoded = np.load('/content/drive/MyDrive/DL_Colab/DL_data/cora_labels_encoded.npy')

H = np.load('/content/drive/MyDrive/DL_Colab/DL_data/cora_features.npy')
data_mask = np.load('/content/drive/MyDrive/DL_Colab/DL_data/cora_mask.npy')

N = H.shape[0]
F = H.shape[1]

print('H shape: ', H.shape)
print('The number of nodes (N): ', N)
print('The number of features (F) of each node: ', F)

num_classes = 7
print('The number of classes: ', num_classes)

H shape:  (2708, 1433)
The number of nodes (N):  2708
The number of features (F) of each node:  1433
The number of classes:  7

3.2 Train/Test Data Splitting¶

We split train and test dataset in ratio of 7:3, so the total number of dataset is 1895 for training, and 813 for testing.

# index of node for train model
train_mask = data_mask[0]

# index of node for test model
test_mask = data_mask[1]

print("The number of trainig data: ", np.sum(train_mask))
print("The number of test data: ", np.sum(test_mask))

The number of trainig data:  1895
The number of test data:  813

3.3 Initializing Graph G¶

G = nx.Graph(name = 'Cora')
G.add_nodes_from(nodes)
G.add_edges_from(edge_list)

3.4 Construct and Normalize Adjacency Matrix A¶

3.4.1. Normalizing term , $\tilde D^{-1/2} (A+I) \tilde D^{-1/2}$¶

from scipy.linalg import fractional_matrix_power

A = nx.adjacency_matrix(G)

I = np.eye(A.shape[-1])
A_self = A + I

D = np.diag(np.array(A_self.sum(1)).flatten())
D_half_norm = fractional_matrix_power(D, -0.5)

A_half_norm = D_half_norm * A_self * D_half_norm

A_half_norm = np.array(A_half_norm)
H = np.array(H)

3.5 GCN Model¶

H_in = tf.keras.layers.Input(shape = (F, ))
A_in = tf.keras.layers.Input(shape = (N, ))

graph_conv_1 = spektral.layers.GCNConv(channels = 16,
                                         activation = 'relu')([H_in, A_in])

graph_conv_2 = spektral.layers.GCNConv(channels = 7,
                                         activation = 'softmax')([graph_conv_1, A_in])

model = tf.keras.models.Model(inputs = [H_in, A_in], outputs = graph_conv_2)

model.compile(optimizer = tf.keras.optimizers.Adam(learning_rate = 1e-2),
              loss = 'categorical_crossentropy',
              weighted_metrics = ['acc'])

model.summary()

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 input_1 (InputLayer)           [(None, 1433)]       0           []                               
                                                                                                  
 input_2 (InputLayer)           [(None, 2708)]       0           []                               
                                                                                                  
 gcn_conv (GCNConv)             (None, 16)           22944       ['input_1[0][0]',                
                                                                  'input_2[0][0]']                
                                                                                                  
 gcn_conv_1 (GCNConv)           (None, 7)            119         ['gcn_conv[0][0]',               
                                                                  'input_2[0][0]']                
                                                                                                  
==================================================================================================
Total params: 23,063
Trainable params: 23,063
Non-trainable params: 0
__________________________________________________________________________________________________

3.6 Train Model¶

model.fit([H, A],
          labels_encoded,
          validation_data = ([H, A], labels_encoded),
          sample_weight = train_mask,
          epochs = 30,
          batch_size = N,
          shuffle = False)

Epoch 1/30
1/1 [==============================] - 3s 3s/step - loss: 1.8581 - acc: 0.1842 - val_loss: 5.9638 - val_acc: 0.1621
Epoch 2/30
1/1 [==============================] - 0s 300ms/step - loss: 4.1739 - acc: 0.1652 - val_loss: 2.6650 - val_acc: 0.4003
Epoch 3/30
1/1 [==============================] - 0s 174ms/step - loss: 1.8839 - acc: 0.3974 - val_loss: 2.2744 - val_acc: 0.3047
Epoch 4/30
1/1 [==============================] - 0s 115ms/step - loss: 1.6057 - acc: 0.3045 - val_loss: 2.1761 - val_acc: 0.3431
Epoch 5/30
1/1 [==============================] - 0s 134ms/step - loss: 1.5363 - acc: 0.3446 - val_loss: 1.7280 - val_acc: 0.4450
Epoch 6/30
1/1 [==============================] - 0s 95ms/step - loss: 1.2217 - acc: 0.4485 - val_loss: 1.4554 - val_acc: 0.5192
Epoch 7/30
1/1 [==============================] - 0s 94ms/step - loss: 1.0330 - acc: 0.5203 - val_loss: 1.3258 - val_acc: 0.5679
Epoch 8/30
1/1 [==============================] - 0s 94ms/step - loss: 0.9443 - acc: 0.5683 - val_loss: 1.2657 - val_acc: 0.6104
Epoch 9/30
1/1 [==============================] - 0s 87ms/step - loss: 0.9011 - acc: 0.6111 - val_loss: 1.2211 - val_acc: 0.6422
Epoch 10/30
1/1 [==============================] - 0s 89ms/step - loss: 0.8702 - acc: 0.6417 - val_loss: 1.1824 - val_acc: 0.6484
Epoch 11/30
1/1 [==============================] - 0s 119ms/step - loss: 0.8439 - acc: 0.6475 - val_loss: 1.1520 - val_acc: 0.6558
Epoch 12/30
1/1 [==============================] - 0s 100ms/step - loss: 0.8239 - acc: 0.6580 - val_loss: 1.1298 - val_acc: 0.6617
Epoch 13/30
1/1 [==============================] - 0s 85ms/step - loss: 0.8084 - acc: 0.6628 - val_loss: 1.1183 - val_acc: 0.6699
Epoch 14/30
1/1 [==============================] - 0s 88ms/step - loss: 0.7998 - acc: 0.6718 - val_loss: 1.1051 - val_acc: 0.6813
Epoch 15/30
1/1 [==============================] - 0s 130ms/step - loss: 0.7903 - acc: 0.6828 - val_loss: 1.0913 - val_acc: 0.6828
Epoch 16/30
1/1 [==============================] - 0s 88ms/step - loss: 0.7801 - acc: 0.6850 - val_loss: 1.0740 - val_acc: 0.7042
Epoch 17/30
1/1 [==============================] - 0s 92ms/step - loss: 0.7672 - acc: 0.7055 - val_loss: 1.0537 - val_acc: 0.7142
Epoch 18/30
1/1 [==============================] - 0s 82ms/step - loss: 0.7519 - acc: 0.7156 - val_loss: 1.0315 - val_acc: 0.7208
Epoch 19/30
1/1 [==============================] - 0s 80ms/step - loss: 0.7350 - acc: 0.7224 - val_loss: 1.0085 - val_acc: 0.7234
Epoch 20/30
1/1 [==============================] - 0s 120ms/step - loss: 0.7175 - acc: 0.7235 - val_loss: 0.9864 - val_acc: 0.7330
Epoch 21/30
1/1 [==============================] - 0s 81ms/step - loss: 0.7006 - acc: 0.7309 - val_loss: 0.9667 - val_acc: 0.7397
Epoch 22/30
1/1 [==============================] - 0s 100ms/step - loss: 0.6853 - acc: 0.7372 - val_loss: 0.9495 - val_acc: 0.7463
Epoch 23/30
1/1 [==============================] - 0s 101ms/step - loss: 0.6716 - acc: 0.7456 - val_loss: 0.9354 - val_acc: 0.7511
Epoch 24/30
1/1 [==============================] - 0s 83ms/step - loss: 0.6601 - acc: 0.7520 - val_loss: 0.9226 - val_acc: 0.7548
Epoch 25/30
1/1 [==============================] - 0s 86ms/step - loss: 0.6497 - acc: 0.7583 - val_loss: 0.9066 - val_acc: 0.7581
Epoch 26/30
1/1 [==============================] - 0s 128ms/step - loss: 0.6374 - acc: 0.7625 - val_loss: 0.8883 - val_acc: 0.7651
Epoch 27/30
1/1 [==============================] - 0s 83ms/step - loss: 0.6235 - acc: 0.7710 - val_loss: 0.8711 - val_acc: 0.7710
Epoch 28/30
1/1 [==============================] - 0s 93ms/step - loss: 0.6102 - acc: 0.7784 - val_loss: 0.8566 - val_acc: 0.7718
Epoch 29/30
1/1 [==============================] - 0s 85ms/step - loss: 0.5988 - acc: 0.7773 - val_loss: 0.8442 - val_acc: 0.7792
Epoch 30/30
1/1 [==============================] - 0s 122ms/step - loss: 0.5891 - acc: 0.7852 - val_loss: 0.8316 - val_acc: 0.7825

<keras.callbacks.History at 0x7ec0f31cf520>

3.7 Model Evaluation¶

y_pred = model.evaluate([H, A],
                        labels_encoded,
                        sample_weight = test_mask,
                        batch_size = N)

1/1 [==============================] - 0s 454ms/step - loss: 0.2523 - acc: 0.7712

3.8 T-SNE¶

from sklearn.manifold import TSNE

layer_outputs = [layer.output for layer in model.layers]
activation_model = tf.keras.models.Model(inputs = model.input, outputs = layer_outputs)
activations = activation_model.predict([H,A_half_norm],batch_size = N)

x_tsne = TSNE(n_components = 2).fit_transform(activations[2])

1/1 [==============================] - 0s 304ms/step

def plot_tSNE(labels_encoded,x_tsne):
    color_map = np.argmax(labels_encoded, axis = 1)
    plt.figure(figsize = (10,10))
    for cl in range(num_classes):
        indices = np.where(color_map == cl)
        indices = indices[0]
        plt.scatter(x_tsne[indices,0], x_tsne[indices, 1], label = cl)
    plt.legend()
    plt.show()

plot_tSNE(labels_encoded,x_tsne)

4. Useful Resources for Further Study ¶

%%html
<center><iframe src="https://www.youtube.com/embed/fOctJB4kVlM?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>

%%html
<center><iframe src="https://www.youtube.com/embed/ABCGCf8cJOE?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>

%%html
<center><iframe src="https://www.youtube.com/embed/0YLZXjMHA-8?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>

%%html
<center><iframe src="https://www.youtube.com/embed/ex2qllcVneY?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>

%%html
<center><iframe src="https://www.youtube.com/embed/YL1jGgcY78U?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>

%%html
<center><iframe src="https://www.youtube.com/embed/8owQBFAHw7E?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>

%%html
<center><iframe src="https://www.youtube.com/embed/R67-JxtOQzg?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>

University cources:

COMP766 at McGill (https://cs.mcgill.ca/~wlh/comp766/)
Alelab Alelab (https://www.youtube.com/channel/UC_YPrqpiEqkeGOG1TCt0giQ/playlists)

%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')