Graph Neural Networks (GNN)


By Prof. Seungchul Lee
http://iailab.kaist.ac.kr/
Industrial AI Lab at KAIST

Table of Contents

1. Graph



  • abstract relations, topology, or connectivity
  • Graphs $G(V,E)$
    • $V$: a set of vertices (nodes)
    • $E$: a set of edges (links, relations)
    • weight (edge property)
      • distance in a road network
      • strength of connection in a personal network
  • Graphs model any situation where you have objects and pairwise relations (symmetirc or asymmetirc) between the objects
Vertex Edge
People like each other undirected
People is the boss of directed
Tasks cannot be processed at the same time undirected
Computers have a direct network connection undirected
Airports planes flies between them directed
City can travel between them directed

1.1. Type of Graph

Undirected Graph vs. Directed Graph

  • Undirected graph
    • Edges of undirected graph points both ways between nodes
    • ex) Two-way road
  • Directed graph
    • A graph in which the edges are directed
    • ex) One-way raod

Weighted Graph

  • A graph with edges assigned costs or weights
  • Also called 'Network'
    • ex) connection between cities, length of road, circuit element capacity, communication network usage fee, etc.

1.2. Graph Representation

Graph and Adjacency Matrix

  • Simple undirected graph consist of only nodes and edges
  • Graph can be represented as adjacency matrix $A$
    • Adjacency matrix $A$ indicates adjacent nodes for each node
  • Need (number of nodes) $ \times$ (number of nodes) shape matirx to represent adjacency matirx of undirected graph
    • Symmetirc matirx



1.3. Adjacent Matrix

  • Undirected graph $G = (V,E)$



$$ \begin{align*}V &= \{1,2,\cdots,7\} \\ E &= \{\{1,2\},\{1,6\},\{2,3\},\{3,4\},\{3,6\},\{3,7\},\{4,7\},\{5,6\} \} \end{align*} $$


$$\text{Adjacency list} = \begin{cases} \;\; \text{adj}(1) = \{2,6\}\\ \;\; \text{adj}(2) = \{1,3\}\\ \;\; \text{adj}(3) = \{2,4,6,7\}\\ \;\; \text{adj}(4) = \{3,7\}\\ \;\; \text{adj}(5) = \{6\}\\ \;\; \text{adj}(6) = \{1,3,5\}\\ \;\; \text{adj}(7) = \{3,4\} \end{cases}$$


$$ \text{Adjacency matrix (symmetric) } A = \begin{bmatrix} 0&1&0&0&0&1&0\\ 1&0&1&0&0&0&0\\ 0&1&0&1&0&1&1\\ 0&0&1&0&0&0&1\\ 0&0&0&0&0&1&0\\ 1&0&1&0&1&0&0\\ 0&0&1&1&0&0&0\\ \end{bmatrix}$$
  • Directed graph $G = (V,E)$



$$ \begin{align*} V &= \{1,2,\cdots,7\} \\ E &= \{\{1,2\},\{1,6\},\{2,3\},\{3,4\},\{3,7\},\{4,7\},\{6,3\},\{6,5\} \} \end{align*} $$


$$\text{Adjacency list} = \begin{cases} \;\; \text{adj}(1) &= \{2,6\}\\ \;\; \text{adj}(2) &= \{3\}\\ \;\; \text{adj}(3) &= \{4,7\}\\ \;\; \text{adj}(4) &= \{7\}\\ \;\; \text{adj}(5) &= \phi\\ \;\; \text{adj}(6) &= \{3,5\}\\ \;\; \text{adj}(7) &= \phi \end{cases}$$


$$ \text{Adjacency matrix (symmetric) } A = \begin{bmatrix} 0&1&0&0&0&1&0\\ 0&0&1&0&0&0&0\\ 0&0&0&1&0&0&1\\ 0&0&0&0&0&0&1\\ 0&0&0&0&0&0&0\\ 0&0&1&0&1&0&0\\ 0&0&0&0&0&0&0\\ \end{bmatrix}$$
In [ ]:
# !pip install networkx
In [ ]:
import networkx as nx
import matplotlib.pyplot as plt

%matplotlib inline
Graph.add_edge
In [ ]:
g = nx.Graph()
g.add_edge('a', 'b')
g.add_edge('b', 'c')
g.add_edge('a', 'c')
g.add_edge('c', 'd')
In [ ]:
# draw a graph with nodes and edges

nx.draw(g)
plt.show()
In [ ]:
# draw a graph with node labels

pos = nx.spring_layout(g)

nx.draw(g, pos, node_size = 500)
nx.draw_networkx_labels(g, pos, font_size = 10)
plt.show()
Graph.add_nodes_from
Graph.add_edges_from
In [ ]:
G = nx.Graph()

G.add_nodes_from([1, 2, 3, 4])
G.add_edges_from([(1,2), (1,3), (2,3), (3,4)])

# plot a graph
pos = nx.spring_layout(G)

nx.draw(G, pos, node_size = 500)
nx.draw_networkx_labels(G, pos, font_size = 10)
plt.show()
In [ ]:
print(nx.number_of_nodes(G))
print(nx.number_of_edges(G))
print(G.nodes())
print(G.edges())
4
4
[1, 2, 3, 4]
[(1, 2), (1, 3), (2, 3), (3, 4)]
In [ ]:
A = nx.adjacency_matrix(G)

print(A)
print(A.todense())
  (0, 1)	1
  (0, 2)	1
  (1, 0)	1
  (1, 2)	1
  (2, 0)	1
  (2, 1)	1
  (2, 3)	1
  (3, 2)	1
[[0 1 1 0]
 [1 0 1 0]
 [1 1 0 1]
 [0 0 1 0]]

1.4. Degree

Degree of Undirected Graph

  • the degree of vertex in a graph is the number of edges connected to it
  • denote the degree of vertex $i$ by $d_{i}$
  • for an undirected graph of $n$ vertices


$$ d_i = \sum_{j=1}^{n} \; A_{ij} $$

  • Degree matrix $D$ of adjacent matrix $A$


$$D = \text{diag}\{d_1, d_2, \cdots \}$$

  • Example





$$A = \begin{bmatrix} 0 & 1 & 1 & 1 \\ 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 1 \\ 1 & 0 & 1 & 0 \end{bmatrix} \qquad \Rightarrow \qquad D = \begin{bmatrix} 3 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 2 \end{bmatrix} $$

1.5. Self-connecting Edges





$$A = \begin{bmatrix} 0 & 1 & 1 & 1 \\ 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 1 \\ 1 & 0 & 1 & 0 \end{bmatrix} \qquad \Rightarrow \qquad A+I = \begin{bmatrix} 1 & 1 & 1 & 1 \\ 1 & 1 & 0 & 0 \\ 1 & 0 & 1 & 1 \\ 1 & 0 & 1 & 1 \end{bmatrix} \qquad \Rightarrow \qquad \tilde D = \begin{bmatrix} 4 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 \\ 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 3 \end{bmatrix} $$

1.6. Neighborhood Normalization

Some nodes have many edges, but some don't

  • Adding $I$ is to add self-connecting edges

  • Considering neighboring nodes in the normalized weights

  • To prevent numerical instabilities and vanishing/exploding gradients in order for the model to converge

1) (First attempt) Normalized $\tilde A$

$$\tilde A = \tilde D^{-1}(A+I)$$

2) Normalized $\tilde A$

$$\tilde A = \tilde D^{-1/2}(A+I) \tilde D^{-1/2}$$





In [ ]:
import numpy as np
In [ ]:
A = np.array([[0,1,1,1],
              [1,0,0,0],
              [1,0,0,1],
              [1,0,1,0]])

A_self = A + np.eye(4)

print(A_self)
[[1. 1. 1. 1.]
 [1. 1. 0. 0.]
 [1. 0. 1. 1.]
 [1. 0. 1. 1.]]
In [ ]:
D = np.array(A_self.sum(1)).flatten()
D = np.diag(D)

print(D)
[[4. 0. 0. 0.]
 [0. 2. 0. 0.]
 [0. 0. 3. 0.]
 [0. 0. 0. 3.]]



1) (First attempt) Normalized $\tilde A$

$$\tilde A = \tilde D^{-1}(A+I)$$
  • It is not symmetric.
In [ ]:
A_norm = np.linalg.inv(D).dot(A_self)

print(A_norm)
[[0.25       0.25       0.25       0.25      ]
 [0.5        0.5        0.         0.        ]
 [0.33333333 0.         0.33333333 0.33333333]
 [0.33333333 0.         0.33333333 0.33333333]]



2) Normalized $\tilde A$

$$\tilde A = \tilde D^{-1/2}(A+I) \tilde D^{-1/2}$$
  • Now it is symmetric.

  • (Skip the details)

In [ ]:
from scipy.linalg import fractional_matrix_power

D_half_norm = fractional_matrix_power(D, -0.5)

print(D_half_norm)
[[0.5        0.         0.         0.        ]
 [0.         0.70710678 0.         0.        ]
 [0.         0.         0.57735027 0.        ]
 [0.         0.         0.         0.57735027]]
In [ ]:
A_self = np.asmatrix(A_self)
D_half_norm = np.asmatrix(D_half_norm)

A_half_norm = D_half_norm*A_self*D_half_norm

print(A_half_norm)
[[0.25       0.35355339 0.28867513 0.28867513]
 [0.35355339 0.5        0.         0.        ]
 [0.28867513 0.         0.33333333 0.33333333]
 [0.28867513 0.         0.33333333 0.33333333]]

2. Graph Convolution Network (GCN)

2.1. Convolution

  • In previous CNN lecture CNN has two characteristics, preserving the spatial structure and weight sharing
  • To apply convolution in graph network, graph also has to conpensate that characteristics too


Convolution Layer

  • In CNN, convolution layer preserve the spatial structure of input
  • It convolve over all spatial locations
    • Extract features for each convolution layer



Weight Sharing

  • Reduce the number of parameters by weight sharing
  • Within the same layer, the same filter will be used throughout image



2.2. Connection between CNN and GCN

  • GCNs perform similar operations where the model learns the features by inspecting neighboring nodes
  • The major difference between CNNs and GCNs is that CNNs are specially built to operate on regular (Euclidean) structured data, while GCNs operate for the graph data that the number of nodes connections vary and the nodes are unordered (irregular on non-Euclidean structured data)




2.3. Basics of GCN

  • Similar to CNN, GCN updates each node with their adjacent nodes
  • Unlike CNN, each node of GCN has different number of adjacent nodes
    • Indicate adjacent nodes of each node by adjacency matrix A
  • Basic process (or terminology) of GCN
    • Message: information passed by neighboring nodes to the central node
    • Aggregate: collect information from neighboring nodes
    • Update: embedding update by combining information from neighboring nodes and from itself






$$ \begin{align*} h_{u}^{(k+1)} &= \text{UPDATE} \left( \text{AGGREGATE} \left( \left\{ h_{v}^{(k)}, \forall v \in \mathcal{N}(u) \right\} \right) \right)\\ \end{align*} $$



1) Message Aggregation from Local Neighborhood


$$ \begin{align*} &\text{AGGREGATE} \left( \left\{ h_{v}^{(k)}, \forall v \in \mathcal{N}(u) \right\} \right)\\\\ &\Rightarrow AH^{(k)} \end{align*} $$





2) Update


Adding a non-linear function: $k^{\text{th}}$ layer

$$ \begin{align*} H^{(k+1)} &= f \left( A, H^{(k)} \right) \\ & = \sigma \left( A H^{(k)} \, W \right) \end{align*} $$



$$ \begin{align*} h_{u}^{(k+1)} &= \text{UPDATE} \left( \text{AGGREGATE} \left( \left\{ h_{v}^{(k)}, \forall v \in \mathcal{N}(u) \right\} \right) \right)\\\\ H^{(k+1)} &= \sigma \left(A H^{(k)} \, W_{\text{neigh}}^{(k)} \right) \end{align*} $$


  • $h_1^{(k)}$: feature matirx of first node in $k^{th}$ layer
  • $W^{(k)}$: weight of $k^{th}$ layer
    • Weight sharing: share same weight for each layer
      • In the same layer, each node is updated similarly, so it shares the same weight
      • weight sharing enchance computing complexity and time





2.4. Further Improvements for GCN

1) Message Passing with Self-Loops

  • As a simplification of the neural message passing approach, it is common to add self-loops to the input graph and omit the explicit update step


$$ \begin{align*} h_{u}^{(k+1)} &= \text{UPDATE} \left( h_{u}^{(k)}, \text{AGGREGATE} \left( \left\{ h_{v}^{(k)}, \forall v \in \mathcal{N}(u) \right\} \right) \right) \\ &= \text{UPDATE} \left( \text{AGGREGATE} \left( \left\{ h_{v}^{(k)}, \forall v \in \mathcal{N}(u) \cup \{u \}\right\} \right) \right) \\ \\ H^{(k+1)} &= \sigma \left( \left(A+I \right)H^{(k)} \, W^{(k)}\right) \end{align*} $$






2) Neighborhood Normalization

  • The most basic neighborhood aggregation operation simply takes the sum of the neighbor embedding.
  • One issue with this approach is that it can be unstable and highly sensitive to node degrees.
  • One solution to this problem is to simply normalize the aggregation operation based upon the degrees of the nodes involved.
  • The simplest approach is to just take a weighted average rather than sum.


$$ \begin{align*} \tilde A &= D^{-1/2}AD^{-1/2} + I \\ & \approx \tilde D^{-1/2}(A+I) \tilde D^{-1/2} \qquad \text{where } \, \tilde D \, \text{ is the degree matrix of } A+I \end{align*} $$






Finally Graph Convolutional Networks


$$ \begin{align*} H^{(k+1)} &= \sigma \left(A H^{(k)} \, W^{(k)} \right) \\\\ &\Downarrow \\\\ H^{(k+1)} &= \sigma \left( \left(A+I \right)H^{(k)} \, W^{(k)}\right) \\\\ &\Downarrow \\\\ H^{(k+1)} &= \sigma \left( \left(\tilde D^{-1/2}(A+I)\tilde D^{-1/2} \right)H^{(k)} \, W^{(k)}\right)\\\\\\ \therefore H^{(k+1)} &= \sigma \left( \tilde A H^{(k)} \, W^{(k)}\right) \end{align*} $$


  • For each layer, the feature matrix and weight matrix are multiplied to create the next feature matrix




In [ ]:
import networkx as nx
import matplotlib.pyplot as plt

%matplotlib inline

G = nx.Graph()

G.add_nodes_from([1, 2, 3, 4, 5, 6])
G.add_edges_from([(1, 2), (1, 3), (2, 3), (1, 4), (4, 5), (4, 6), (5, 6)])

nx.draw(G, with_labels = True, node_size = 600, font_size = 22)
plt.show()
In [ ]:
A = nx.adjacency_matrix(G).todense()

print(A)
[[0 1 1 1 0 0]
 [1 0 1 0 0 0]
 [1 1 0 0 0 0]
 [1 0 0 0 1 1]
 [0 0 0 1 0 1]
 [0 0 0 1 1 0]]

Assign feature vector $H$, so that it can be separated into two groups

In [ ]:
H = np.matrix([1,0,0,-1,0,0]).T

print(H)
[[ 1]
 [ 0]
 [ 0]
 [-1]
 [ 0]
 [ 0]]

Product of adjacency matrix and node features matrix represents the sum of neighboring node features

In [ ]:
A*H
Out[ ]:
matrix([[-1],
        [ 1],
        [ 1],
        [ 1],
        [-1],
        [-1]])
In [ ]:
A_self = A + np.eye(6)

A_self*H
Out[ ]:
matrix([[ 0.],
        [ 1.],
        [ 1.],
        [ 0.],
        [-1.],
        [-1.]])

Similar to data pre-processing for any neural networks operation, normalize the features to prevent numerical instabilities and vanishing/exploding gradients in order for the model to converge

In [ ]:
D = np.array(A_self.sum(1)).flatten()
D = np.diag(D)

D_half_norm = fractional_matrix_power(D, -0.5)

A_self = np.asmatrix(A_self)
D_half_norm = np.asmatrix(D_half_norm)

A_half_norm = D_half_norm*A_self*D_half_norm

A_half_norm*H
Out[ ]:
matrix([[ 0.        ],
        [ 0.28867513],
        [ 0.28867513],
        [ 0.        ],
        [-0.28867513],
        [-0.28867513]])

Build 2-layer GCN using ReLU as the activation function


$$ \begin{align*} H^{(2)} &= \text{ReLU} \left( \tilde A H^{(1)} \, W^{(1)}\right) \\ H^{(3)} &= \text{ReLU} \left( \tilde A H^{(2)} \, W^{(2)}\right) \end{align*} $$
In [ ]:
np.random.seed(20)

W1 = np.random.randn(1, 4) # input: 1 -> hidden: 4
W2 = np.random.randn(4, 2) # hidden: 4 -> output: 2

def relu(x):
    return np.maximum(0, x)

def gcn(A_self, H, W):
    D = np.diag(np.array(A_self.sum(1)).flatten())
    D_half_norm = fractional_matrix_power(D, -0.5)
    H_new = D_half_norm*A_self*D_half_norm*H*W
    return relu(H_new)

H1 = H
H2 = gcn(A_self, H1, W1)
H3 = gcn(A_self, H2, W2)

print(H3)
[[0.         0.07472825]
 [0.         0.08628875]
 [0.         0.08628875]
 [0.12632564 0.        ]
 [0.14586829 0.        ]
 [0.14586829 0.        ]]

2.5. Readout: Permutation Invariance

  • Adjacency matrix can be different even though two graph has the same network structure

    • Even if the edge information between all nodes is the same, the order of values in the matrix may be different due to rotation and symmetry
  • Therefore, in graph-level representation,Readout layer makes this permutation invariant by multiplying MLP





  • Node-wise summation


$$ Z_G = \sigma \left(\sum_{i \in G} \text{MLP} \left(H_i^{(L)} \right) \right) $$




2.6. Overall Structure of GCN





  • Graph information with feature matrix and adjacency matrix input to GCN
  • Graph Convolution Layer
    • Update information of each node according to their adjacency matrix





  • Collect all node information with MLP and determine a certain value for regression or classification in readout layer

2.7. Three Types of GNN Problem

  • Task 1: Node classification

  • Task 2: Edges prediction

  • Task 3: Graph classification





3. Lab 1: Node Classification using Graph Convolutional Networks

3.0. List of GNN Python Libraries

  • Deep Graph Library (DGL)
    • Based on PyTorch, TensorFlow or Apache MXNet.
  • Graph Nets
    • DeepMind’s library for building graph networks in Tensorflow and Sonnet
  • Spektral
    • Based on the Keras API and TensorFlow 2
    • We will use this one for demo
In [14]:
# !pip install spektral==1.3.0
# !pip install tensorflow==2.11.0
# !pip install keras==2.11.0
In [2]:
import numpy as np
import networkx as nx
import tensorflow as tf
import matplotlib.pyplot as plt

import spektral

3.1. Data Loading

Download data from here


CORA dataset

  • This dataset is the MNIST equivalent in graph learning

  • The CORA dataset consists of 2708 scientific publications classified into one of seven classes.

    • Case_Based: 298
    • Genetic_Algorithms: 418
    • Neural_Networks: 818
    • Probabilistic_Methods: 426
    • Reinforcement_Learning: 217
    • Rule_Learning: 180
    • Theory: 351
  • The citation network consists of 5429 links.

  • Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary.

  • The dictionary consists of 1433 unique words.

In [3]:
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
In [4]:
nodes = np.load('/content/drive/MyDrive/DL_Colab/DL_data/cora_nodes.npy')
edge_list = np.load('/content/drive/MyDrive/DL_Colab/DL_data/cora_edges.npy')

labels_encoded = np.load('/content/drive/MyDrive/DL_Colab/DL_data/cora_labels_encoded.npy')

H = np.load('/content/drive/MyDrive/DL_Colab/DL_data/cora_features.npy')
data_mask = np.load('/content/drive/MyDrive/DL_Colab/DL_data/cora_mask.npy')

N = H.shape[0]
F = H.shape[1]

print('H shape: ', H.shape)
print('The number of nodes (N): ', N)
print('The number of features (F) of each node: ', F)

num_classes = 7
print('The number of classes: ', num_classes)
H shape:  (2708, 1433)
The number of nodes (N):  2708
The number of features (F) of each node:  1433
The number of classes:  7





3.2 Train/Test Data Splitting

We split train and test dataset in ratio of 7:3, so the total number of dataset is 1895 for training, and 813 for testing.

In [5]:
# index of node for train model
train_mask = data_mask[0]

# index of node for test model
test_mask = data_mask[1]
In [6]:
print("The number of trainig data: ", np.sum(train_mask))
print("The number of test data: ", np.sum(test_mask))
The number of trainig data:  1895
The number of test data:  813

3.3 Initializing Graph G

In [7]:
G = nx.Graph(name = 'Cora')
G.add_nodes_from(nodes)
G.add_edges_from(edge_list)

3.4 Construct and Normalize Adjacency Matrix A

3.4.1. Normalizing term , $\tilde D^{-1/2} (A+I) \tilde D^{-1/2}$

In [8]:
from scipy.linalg import fractional_matrix_power

A = nx.adjacency_matrix(G)

I = np.eye(A.shape[-1])
A_self = A + I

D = np.diag(np.array(A_self.sum(1)).flatten())
D_half_norm = fractional_matrix_power(D, -0.5)

A_half_norm = D_half_norm * A_self * D_half_norm

A_half_norm = np.array(A_half_norm)
H = np.array(H)

3.5 GCN Model

In [9]:
H_in = tf.keras.layers.Input(shape = (F, ))
A_in = tf.keras.layers.Input(shape = (N, ))

graph_conv_1 = spektral.layers.GCNConv(channels = 16,
                                         activation = 'relu')([H_in, A_in])

graph_conv_2 = spektral.layers.GCNConv(channels = 7,
                                         activation = 'softmax')([graph_conv_1, A_in])

model = tf.keras.models.Model(inputs = [H_in, A_in], outputs = graph_conv_2)

model.compile(optimizer = tf.keras.optimizers.Adam(learning_rate = 1e-2),
              loss = 'categorical_crossentropy',
              weighted_metrics = ['acc'])

model.summary()
Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 input_1 (InputLayer)           [(None, 1433)]       0           []                               
                                                                                                  
 input_2 (InputLayer)           [(None, 2708)]       0           []                               
                                                                                                  
 gcn_conv (GCNConv)             (None, 16)           22944       ['input_1[0][0]',                
                                                                  'input_2[0][0]']                
                                                                                                  
 gcn_conv_1 (GCNConv)           (None, 7)            119         ['gcn_conv[0][0]',               
                                                                  'input_2[0][0]']                
                                                                                                  
==================================================================================================
Total params: 23,063
Trainable params: 23,063
Non-trainable params: 0
__________________________________________________________________________________________________

3.6 Train Model

In [10]:
model.fit([H, A],
          labels_encoded,
          validation_data = ([H, A], labels_encoded),
          sample_weight = train_mask,
          epochs = 30,
          batch_size = N,
          shuffle = False)
Epoch 1/30
1/1 [==============================] - 3s 3s/step - loss: 1.8581 - acc: 0.1842 - val_loss: 5.9638 - val_acc: 0.1621
Epoch 2/30
1/1 [==============================] - 0s 300ms/step - loss: 4.1739 - acc: 0.1652 - val_loss: 2.6650 - val_acc: 0.4003
Epoch 3/30
1/1 [==============================] - 0s 174ms/step - loss: 1.8839 - acc: 0.3974 - val_loss: 2.2744 - val_acc: 0.3047
Epoch 4/30
1/1 [==============================] - 0s 115ms/step - loss: 1.6057 - acc: 0.3045 - val_loss: 2.1761 - val_acc: 0.3431
Epoch 5/30
1/1 [==============================] - 0s 134ms/step - loss: 1.5363 - acc: 0.3446 - val_loss: 1.7280 - val_acc: 0.4450
Epoch 6/30
1/1 [==============================] - 0s 95ms/step - loss: 1.2217 - acc: 0.4485 - val_loss: 1.4554 - val_acc: 0.5192
Epoch 7/30
1/1 [==============================] - 0s 94ms/step - loss: 1.0330 - acc: 0.5203 - val_loss: 1.3258 - val_acc: 0.5679
Epoch 8/30
1/1 [==============================] - 0s 94ms/step - loss: 0.9443 - acc: 0.5683 - val_loss: 1.2657 - val_acc: 0.6104
Epoch 9/30
1/1 [==============================] - 0s 87ms/step - loss: 0.9011 - acc: 0.6111 - val_loss: 1.2211 - val_acc: 0.6422
Epoch 10/30
1/1 [==============================] - 0s 89ms/step - loss: 0.8702 - acc: 0.6417 - val_loss: 1.1824 - val_acc: 0.6484
Epoch 11/30
1/1 [==============================] - 0s 119ms/step - loss: 0.8439 - acc: 0.6475 - val_loss: 1.1520 - val_acc: 0.6558
Epoch 12/30
1/1 [==============================] - 0s 100ms/step - loss: 0.8239 - acc: 0.6580 - val_loss: 1.1298 - val_acc: 0.6617
Epoch 13/30
1/1 [==============================] - 0s 85ms/step - loss: 0.8084 - acc: 0.6628 - val_loss: 1.1183 - val_acc: 0.6699
Epoch 14/30
1/1 [==============================] - 0s 88ms/step - loss: 0.7998 - acc: 0.6718 - val_loss: 1.1051 - val_acc: 0.6813
Epoch 15/30
1/1 [==============================] - 0s 130ms/step - loss: 0.7903 - acc: 0.6828 - val_loss: 1.0913 - val_acc: 0.6828
Epoch 16/30
1/1 [==============================] - 0s 88ms/step - loss: 0.7801 - acc: 0.6850 - val_loss: 1.0740 - val_acc: 0.7042
Epoch 17/30
1/1 [==============================] - 0s 92ms/step - loss: 0.7672 - acc: 0.7055 - val_loss: 1.0537 - val_acc: 0.7142
Epoch 18/30
1/1 [==============================] - 0s 82ms/step - loss: 0.7519 - acc: 0.7156 - val_loss: 1.0315 - val_acc: 0.7208
Epoch 19/30
1/1 [==============================] - 0s 80ms/step - loss: 0.7350 - acc: 0.7224 - val_loss: 1.0085 - val_acc: 0.7234
Epoch 20/30
1/1 [==============================] - 0s 120ms/step - loss: 0.7175 - acc: 0.7235 - val_loss: 0.9864 - val_acc: 0.7330
Epoch 21/30
1/1 [==============================] - 0s 81ms/step - loss: 0.7006 - acc: 0.7309 - val_loss: 0.9667 - val_acc: 0.7397
Epoch 22/30
1/1 [==============================] - 0s 100ms/step - loss: 0.6853 - acc: 0.7372 - val_loss: 0.9495 - val_acc: 0.7463
Epoch 23/30
1/1 [==============================] - 0s 101ms/step - loss: 0.6716 - acc: 0.7456 - val_loss: 0.9354 - val_acc: 0.7511
Epoch 24/30
1/1 [==============================] - 0s 83ms/step - loss: 0.6601 - acc: 0.7520 - val_loss: 0.9226 - val_acc: 0.7548
Epoch 25/30
1/1 [==============================] - 0s 86ms/step - loss: 0.6497 - acc: 0.7583 - val_loss: 0.9066 - val_acc: 0.7581
Epoch 26/30
1/1 [==============================] - 0s 128ms/step - loss: 0.6374 - acc: 0.7625 - val_loss: 0.8883 - val_acc: 0.7651
Epoch 27/30
1/1 [==============================] - 0s 83ms/step - loss: 0.6235 - acc: 0.7710 - val_loss: 0.8711 - val_acc: 0.7710
Epoch 28/30
1/1 [==============================] - 0s 93ms/step - loss: 0.6102 - acc: 0.7784 - val_loss: 0.8566 - val_acc: 0.7718
Epoch 29/30
1/1 [==============================] - 0s 85ms/step - loss: 0.5988 - acc: 0.7773 - val_loss: 0.8442 - val_acc: 0.7792
Epoch 30/30
1/1 [==============================] - 0s 122ms/step - loss: 0.5891 - acc: 0.7852 - val_loss: 0.8316 - val_acc: 0.7825
Out[10]:
<keras.callbacks.History at 0x7ec0f31cf520>

3.7 Model Evaluation

In [11]:
y_pred = model.evaluate([H, A],
                        labels_encoded,
                        sample_weight = test_mask,
                        batch_size = N)
1/1 [==============================] - 0s 454ms/step - loss: 0.2523 - acc: 0.7712

3.8 T-SNE

In [12]:
from sklearn.manifold import TSNE

layer_outputs = [layer.output for layer in model.layers]
activation_model = tf.keras.models.Model(inputs = model.input, outputs = layer_outputs)
activations = activation_model.predict([H,A_half_norm],batch_size = N)

x_tsne = TSNE(n_components = 2).fit_transform(activations[2])
1/1 [==============================] - 0s 304ms/step
In [13]:
def plot_tSNE(labels_encoded,x_tsne):
    color_map = np.argmax(labels_encoded, axis = 1)
    plt.figure(figsize = (10,10))
    for cl in range(num_classes):
        indices = np.where(color_map == cl)
        indices = indices[0]
        plt.scatter(x_tsne[indices,0], x_tsne[indices, 1], label = cl)
    plt.legend()
    plt.show()

plot_tSNE(labels_encoded,x_tsne)

4. Useful Resources for Further Study

In [ ]:
%%html
<center><iframe src="https://www.youtube.com/embed/fOctJB4kVlM?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [ ]:
%%html
<center><iframe src="https://www.youtube.com/embed/ABCGCf8cJOE?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [ ]:
%%html
<center><iframe src="https://www.youtube.com/embed/0YLZXjMHA-8?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [ ]:
%%html
<center><iframe src="https://www.youtube.com/embed/ex2qllcVneY?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [ ]:
%%html
<center><iframe src="https://www.youtube.com/embed/YL1jGgcY78U?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [ ]:
%%html
<center><iframe src="https://www.youtube.com/embed/8owQBFAHw7E?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [ ]:
%%html
<center><iframe src="https://www.youtube.com/embed/R67-JxtOQzg?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [ ]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')