Graph Neural Networks (GNN)


By Prof. Seungchul Lee
http://iai.postech.ac.kr/
Industrial AI Lab at POSTECH

Table of Contents

1. Graph



  • abstract relations, topology, or connectivity
  • Graphs $G(V,E)$
    • $V$: a set of vertices (nodes)
    • $E$: a set of edges (links, relations)
    • weight (edge property)
      • distance in a road network
      • strength of connection in a personal network
  • Graphs model any situation where you have objects and pairwise relations (symmetirc or asymmetirc) between the objects
Vertex Edge
People like each other undirected
People is the boss of directed
Tasks cannot be processed at the same time undirected
Computers have a direct network connection undirected
Airports planes flies between them directed
City can travel between them directed

1.1. Type of Graph

Undirected Graph vs. Directed Graph

  • Undirected graph
    • Edges of undirected graph points both ways between nodes
    • ex) Two-way road
  • Directed graph
    • A graph in which the edges are directed
    • ex) One-way raod

Weighted Graph

  • A graph with edges assigned costs or weights
  • Also called 'Network'
    • ex) connection between cities, length of road, circuit element capacity, communication network usage fee, etc.

1.2. Graph Representation

Graph and Adjacency Matrix

  • Simple undirected graph consist of only nodes and edges
  • Graph can be represented as adjacency matrix $A$
    • Adjacency matrix $A$ indicates adjacent nodes for each node
  • Need (number of nodes) $ \times$ (number of nodes) shape matirx to represent adjacency matirx of undirected graph
    • Symmetirc matirx



1.3. Adjacent Matrix

  • Undirected graph $G = (V,E)$



$$ \begin{align*}V &= \{1,2,\cdots,7\} \\ E &= \{\{1,2\},\{1,6\},\{2,3\},\{3,4\},\{3,6\},\{3,7\},\{4,7\},\{5,6\} \} \end{align*} $$


$$\text{Adjacency list} = \begin{cases} \;\; \text{adj}(1) = \{2,6\}\\ \;\; \text{adj}(2) = \{1,3\}\\ \;\; \text{adj}(3) = \{2,4,6,7\}\\ \;\; \text{adj}(4) = \{3,7\}\\ \;\; \text{adj}(5) = \{6\}\\ \;\; \text{adj}(6) = \{1,3,5\}\\ \;\; \text{adj}(7) = \{3,4\} \end{cases}$$


$$ \text{Adjacency matrix (symmetric) } A = \begin{bmatrix} 0&1&0&0&0&1&0\\ 1&0&1&0&0&0&0\\ 0&1&0&1&0&1&1\\ 0&0&1&0&0&0&1\\ 0&0&0&0&0&1&0\\ 1&0&1&0&1&0&0\\ 0&0&1&1&0&0&0\\ \end{bmatrix}$$
  • Directed graph $G = (V,E)$



$$ \begin{align*} V &= \{1,2,\cdots,7\} \\ E &= \{\{1,2\},\{1,6\},\{2,3\},\{3,4\},\{3,7\},\{4,7\},\{6,3\},\{6,5\} \} \end{align*} $$


$$\text{Adjacency list} = \begin{cases} \;\; \text{adj}(1) &= \{2,6\}\\ \;\; \text{adj}(2) &= \{3\}\\ \;\; \text{adj}(3) &= \{4,7\}\\ \;\; \text{adj}(4) &= \{7\}\\ \;\; \text{adj}(5) &= \phi\\ \;\; \text{adj}(6) &= \{3,5\}\\ \;\; \text{adj}(7) &= \phi \end{cases}$$


$$ \text{Adjacency matrix (symmetric) } A = \begin{bmatrix} 0&1&0&0&0&1&0\\ 0&0&1&0&0&0&0\\ 0&0&0&1&0&0&1\\ 0&0&0&0&0&0&1\\ 0&0&0&0&0&0&0\\ 0&0&1&0&1&0&0\\ 0&0&0&0&0&0&0\\ \end{bmatrix}$$
In [1]:
# !pip install networkx
In [2]:
import networkx as nx
import matplotlib.pyplot as plt

%matplotlib inline
Graph.add_edge
In [3]:
g = nx.Graph()
g.add_edge('a', 'b')
g.add_edge('b', 'c')
g.add_edge('a', 'c')
g.add_edge('c', 'd')
In [4]:
# draw a graph with nodes and edges

nx.draw(g)
plt.show()
In [5]:
# draw a graph with node labels 

pos = nx.spring_layout(g)

nx.draw(g, pos, node_size = 500)
nx.draw_networkx_labels(g, pos, font_size = 10)
plt.show()
Graph.add_nodes_from
Graph.add_edges_from
In [6]:
G = nx.Graph()

G.add_nodes_from([1, 2, 3, 4])
G.add_edges_from([(1,2), (1,3), (2,3), (3,4)])  

# plot a graph 
pos = nx.spring_layout(G)

nx.draw(G, pos, node_size = 500)
nx.draw_networkx_labels(G, pos, font_size = 10)
plt.show()
In [7]:
print(nx.number_of_nodes(G))
print(nx.number_of_edges(G))
print(G.nodes())
print(G.edges())
4
4
[1, 2, 3, 4]
[(1, 2), (1, 3), (2, 3), (3, 4)]
In [8]:
A = nx.adjacency_matrix(G)

print(A)
print(A.todense())
  (0, 1)	1
  (0, 2)	1
  (1, 0)	1
  (1, 2)	1
  (2, 0)	1
  (2, 1)	1
  (2, 3)	1
  (3, 2)	1
[[0 1 1 0]
 [1 0 1 0]
 [1 1 0 1]
 [0 0 1 0]]

1.4. Degree

Degree of Undirected Graph

  • the degree of vertex in a graph is the number of edges connected to it
  • denote the degree of vertex $i$ by $d_{i}$
  • for an undirected graph of $n$ vertices


$$ d_i = \sum_{j=1}^{n} \; A_{ij} $$

  • Degree matrix $D$ of adjacent matrix $A$


$$D = \text{diag}\{d_1, d_2, \cdots \}$$

  • Example





$$A = \begin{bmatrix} 0 & 1 & 1 & 1 \\ 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 1 \\ 1 & 0 & 1 & 0 \end{bmatrix} \qquad \Rightarrow \qquad D = \begin{bmatrix} 3 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 2 \end{bmatrix} $$

1.5. Self-connecting Edges





$$A = \begin{bmatrix} 0 & 1 & 1 & 1 \\ 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 1 \\ 1 & 0 & 1 & 0 \end{bmatrix} \qquad \Rightarrow \qquad A+I = \begin{bmatrix} 1 & 1 & 1 & 1 \\ 1 & 1 & 0 & 0 \\ 1 & 0 & 1 & 1 \\ 1 & 0 & 1 & 1 \end{bmatrix} \qquad \Rightarrow \qquad \tilde D = \begin{bmatrix} 4 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 \\ 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 3 \end{bmatrix} $$

1.6. Neighborhood Normalization

Some nodes have many edges, but some don't

  • Adding $I$ is to add self-connecting edges

  • Considering neighboring nodes in the normalized weights

  • To prevent numerical instabilities and vanishing/exploding gradients in order for the model to converge

1) (First attempt) Normalized $\tilde A$

$$\tilde A = \tilde D^{-1}(A+I)$$

2) Normalized $\tilde A$

$$\tilde A = \tilde D^{-1/2}(A+I) \tilde D^{-1/2}$$





In [9]:
import numpy as np
In [10]:
A = np.array([[0,1,1,1],
              [1,0,0,0],
              [1,0,0,1],
              [1,0,1,0]])

A_self = A + np.eye(4)

print(A_self)
[[1. 1. 1. 1.]
 [1. 1. 0. 0.]
 [1. 0. 1. 1.]
 [1. 0. 1. 1.]]
In [11]:
D = np.array(A_self.sum(1)).flatten()
D = np.diag(D)

print(D)
[[4. 0. 0. 0.]
 [0. 2. 0. 0.]
 [0. 0. 3. 0.]
 [0. 0. 0. 3.]]



1) (First attempt) Normalized $\tilde A$

$$\tilde A = \tilde D^{-1}(A+I)$$
  • It is not symmetric.
In [12]:
A_norm = np.linalg.inv(D).dot(A_self)

print(A_norm)
[[0.25       0.25       0.25       0.25      ]
 [0.5        0.5        0.         0.        ]
 [0.33333333 0.         0.33333333 0.33333333]
 [0.33333333 0.         0.33333333 0.33333333]]



2) Normalized $\tilde A$

$$\tilde A = \tilde D^{-1/2}(A+I) \tilde D^{-1/2}$$
  • Now it is symmetric.

  • (Skip the details)

In [13]:
from scipy.linalg import fractional_matrix_power

D_half_norm = fractional_matrix_power(D, -0.5)

print(D_half_norm)
[[0.5        0.         0.         0.        ]
 [0.         0.70710678 0.         0.        ]
 [0.         0.         0.57735027 0.        ]
 [0.         0.         0.         0.57735027]]
In [14]:
A_self = np.asmatrix(A_self)
D_half_norm = np.asmatrix(D_half_norm)

A_half_norm = D_half_norm*A_self*D_half_norm

print(A_half_norm)
[[0.25       0.35355339 0.28867513 0.28867513]
 [0.35355339 0.5        0.         0.        ]
 [0.28867513 0.         0.33333333 0.33333333]
 [0.28867513 0.         0.33333333 0.33333333]]

2. Graph Convolution Network (GCN)

2.1. Convolution

  • In previous CNN lecture CNN has two characteristics, preserving the spatial structure and weight sharing
  • To apply convolution in graph network, graph also has to conpensate that characteristics too

Convolution Layer

  • In CNN, convolution layer preserve the spatial structure of input
  • It convolve over all spatial locations
    • Extract features for each convolution layer



Weight Sharing

  • Reduce the number of parameters by weight sharing
  • Within the same layer, the same filter will be used throughout image



2.2. Connection between CNN and GCN

  • GCNs perform similar operations where the model learns the features by inspecting neighboring nodes
  • The major difference between CNNs and GCNs is that CNNs are specially built to operate on regular (Euclidean) structured data, while GCNs operate for the graph data that the number of nodes connections vary and the nodes are unordered (irregular on non-Euclidean structured data)




2.3. Basics of GCN

  • Similar to CNN, GCN updates each node with their adjacent nodes
  • Unlike CNN, each node of GCN has different number of adjacent nodes
    • Indicate adjacent nodes of each node by adjacency matrix $A$
  • Basic process (or terminology) of GCN
    • Message: information passed by neighboring nodes to the central node
    • Aggregate: collect information from neighboring nodes
    • Update: embedding update by combining information from neighboring nodes and from itself






$$ \begin{align*} h_{u}^{(k+1)} &= \text{UPDATE} \left( \text{AGGREGATE} \left( \left\{ h_{v}^{(k)}, \forall v \in \mathcal{N}(u) \right\} \right) \right)\\ \end{align*} $$



1) Message Aggregation from Local Neighborhood


$$ \begin{align*} &\text{AGGREGATE} \left( \left\{ h_{v}^{(k)}, \forall v \in \mathcal{N}(u) \right\} \right)\\\\ &\Rightarrow AH^{(k)} \end{align*} $$





2) Update


Adding a non-linear function: $k^{\text{th}}$ layer

$$ \begin{align*} H^{(k+1)} &= f \left( A, H^{(k)} \right) \\ & = \sigma \left( A H^{(k)} \, W \right) \end{align*} $$



$$ \begin{align*} h_{u}^{(k+1)} &= \text{UPDATE} \left( \text{AGGREGATE} \left( \left\{ h_{v}^{(k)}, \forall v \in \mathcal{N}(u) \right\} \right) \right)\\\\ H^{(k+1)} &= \sigma \left(A H^{(k)} \, W_{\text{neigh}}^{(k)} \right) \end{align*} $$


  • $h_1^{(k)}$: feature matirx of first node in $k^{th}$ layer
  • $W^{(k)}$: weight of $k^{th}$ layer
    • Weight sharing: share same weight for each layer
      • In the same layer, each node is updated similarly, so it shares the same weight
      • weight sharing enchance computing complexity and time





2.4. Further Improvements for GCN

1) Message Passing with Self-Loops

  • As a simplification of the neural message passing approach, it is common to add self-loops to the input graph and omit the explicit update step


$$ \begin{align*} h_{u}^{(k+1)} &= \text{UPDATE} \left( h_{u}^{(k)}, \text{AGGREGATE} \left( \left\{ h_{v}^{(k)}, \forall v \in \mathcal{N}(u) \right\} \right) \right) \\ &= \text{UPDATE} \left( \text{AGGREGATE} \left( \left\{ h_{v}^{(k)}, \forall v \in \mathcal{N}(u) \cup \{u \}\right\} \right) \right) \\ \\ H^{(k+1)} &= \sigma \left( \left(A+I \right)H^{(k)} \, W^{(k)}\right) \end{align*} $$






2) Neighborhood Normalization

  • The most basic neighborhood aggregation operation simply takes the sum of the neighbor embedding.
  • One issue with this approach is that it can be unstable and highly sensitive to node degrees.
  • One solution to this problem is to simply normalize the aggregation operation based upon the degrees of the nodes involved.
  • The simplest approach is to just take a weighted average rather than sum.


$$ \begin{align*} \tilde A &= D^{-1/2}AD^{-1/2} + I \\ & \approx \tilde D^{-1/2}(A+I) \tilde D^{-1/2} \qquad \text{where } \, \tilde D \, \text{ is the degree matrix of } A+I \end{align*} $$






Finally Graph Convolutional Networks


$$ \begin{align*} H^{(k+1)} &= \sigma \left(A H^{(k)} \, W^{(k)} \right) \\\\ &\Downarrow \\\\ H^{(k+1)} &= \sigma \left( \left(A+I \right)H^{(k)} \, W^{(k)}\right) \\\\ &\Downarrow \\\\ H^{(k+1)} &= \sigma \left( \left(\tilde D^{-1/2}(A+I)\tilde D^{-1/2} \right)H^{(k)} \, W^{(k)}\right)\\\\\\ \therefore H^{(k+1)} &= \sigma \left( \tilde A H^{(k)} \, W^{(k)}\right) \end{align*} $$


  • For each layer, the feature matrix and weight matrix are multiplied to create the next feature matrix




In [15]:
import networkx as nx
import matplotlib.pyplot as plt

%matplotlib inline

G = nx.Graph()

G.add_nodes_from([1, 2, 3, 4, 5, 6])
G.add_edges_from([(1, 2), (1, 3), (2, 3), (1, 4), (4, 5), (4, 6), (5, 6)])

nx.draw(G, with_labels = True, node_size = 600, font_size = 22)
plt.show()
In [16]:
A = nx.adjacency_matrix(G).todense()

print(A)
[[0 1 1 1 0 0]
 [1 0 1 0 0 0]
 [1 1 0 0 0 0]
 [1 0 0 0 1 1]
 [0 0 0 1 0 1]
 [0 0 0 1 1 0]]

Assign feature vector $H$, so that it can be separated into two groups

In [17]:
H = np.matrix([1,0,0,-1,0,0]).T

print(H)
[[ 1]
 [ 0]
 [ 0]
 [-1]
 [ 0]
 [ 0]]

Product of adjacency matrix and node features matrix represents the sum of neighboring node features

In [18]:
A*H
Out[18]:
matrix([[-1],
        [ 1],
        [ 1],
        [ 1],
        [-1],
        [-1]])
In [19]:
A_self = A + np.eye(6)

A_self*H
Out[19]:
matrix([[ 0.],
        [ 1.],
        [ 1.],
        [ 0.],
        [-1.],
        [-1.]])

Similar to data pre-processing for any neural networks operation, normalize the features to prevent numerical instabilities and vanishing/exploding gradients in order for the model to converge

In [20]:
D = np.array(A_self.sum(1)).flatten()
D = np.diag(D)

D_half_norm = fractional_matrix_power(D, -0.5)

A_self = np.asmatrix(A_self)
D_half_norm = np.asmatrix(D_half_norm)

A_half_norm = D_half_norm*A_self*D_half_norm

A_half_norm*H
Out[20]:
matrix([[ 0.        ],
        [ 0.28867513],
        [ 0.28867513],
        [ 0.        ],
        [-0.28867513],
        [-0.28867513]])

Build 2-layer GCN using ReLU as the activation function

$$ \begin{align*} H^{(2)} &= \text{ReLU} \left( \tilde A H^{(1)} \, W^{(1)}\right) \\ H^{(3)} &= \text{ReLU} \left( \tilde A H^{(2)} \, W^{(2)}\right) \end{align*} $$
In [21]:
np.random.seed(20)

W1 = np.random.randn(1, 4) # input: 1 -> hidden: 4
W2 = np.random.randn(4, 2) # hidden: 4 -> output: 2

def relu(x):
    return np.maximum(0, x)

def gcn(A_self, H, W):   
    D = np.diag(np.array(A_self.sum(1)).flatten())    
    D_half_norm = fractional_matrix_power(D, -0.5)
    H_new = D_half_norm*A_self*D_half_norm*H*W
    return relu(H_new)

H1 = H
H2 = gcn(A_self, H1, W1)
H3 = gcn(A_self, H2, W2)

print(H3)
[[0.         0.07472825]
 [0.         0.08628875]
 [0.         0.08628875]
 [0.12632564 0.        ]
 [0.14586829 0.        ]
 [0.14586829 0.        ]]

2.5. Readout: Permutation Invariance

  • Adjacency matrix can be different even though two graph has the same network structure

    • Even if the edge information between all nodes is the same, the order of values in the matrix may be different due to rotation and symmetry
  • Therefore, in graph-level representation,Readout layer makes this permutation invariant by multiplying MLP





  • Node-wise summation


$$ Z_G = \sigma \left(\sum_{i \in G} \text{MLP} \left(H_i^{(L)} \right) \right) $$




2.6. Overall Structure of GCN





  • Graph information with feature matrix and adjacency matrix input to GCN
  • Graph Convolution Layer
    • Update information of each node according to their adjacency matrix





  • Collect all node information with MLP and determine a certain value for regression or classification in readout layer

2.7. Three Types of GNN Problem

  • Task 1: Node classification

  • Task 2: Edges prediction

  • Task 3: Graph classification





3. Lab 1: Node Classification using Graph Convolutional Networks

3.0. List of GNN Python Libraries

  • Deep Graph Library (DGL)
    • Based on PyTorch, TensorFlow or Apache MXNet.
  • Graph Nets
    • DeepMind’s library for building graph networks in Tensorflow and Sonnet
  • Spektral
    • Based on the Keras API and TensorFlow 2
    • We will use this one for demo
In [22]:
# !pip install spektral==0.6.0
# !pip install tensorflow==2.2.0
# !pip install keras==2.3.0
In [23]:
import numpy as np
import networkx as nx
import tensorflow as tf
import matplotlib.pyplot as plt

import spektral

3.1. Data Loading

Download data from here

CORA dataset

  • This dataset is the MNIST equivalent in graph learning

  • The CORA dataset consists of 2708 scientific publications classified into one of seven classes.

    • Case_Based: 298
    • Genetic_Algorithms: 418
    • Neural_Networks: 818
    • Probabilistic_Methods: 426
    • Reinforcement_Learning: 217
    • Rule_Learning: 180
    • Theory: 351
  • The citation network consists of 5429 links.

  • Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary.

  • The dictionary consists of 1433 unique words.

In [24]:
nodes = np.load('./data_files/cora_nodes.npy')
edge_list = np.load('./data_files/cora_edges.npy')

labels_encoded = np.load('./data_files/cora_labels_encoded.npy')

H = np.load('./data_files/cora_features.npy')
data_mask = np.load('./data_files/cora_mask.npy')

N = H.shape[0]
F = H.shape[1]

print('H shape: ', H.shape)
print('The number of nodes (N): ', N)
print('The number of features (F) of each node: ', F)

num_classes = 7
print('The number of classes: ', num_classes)
H shape:  (2708, 1433)
The number of nodes (N):  2708
The number of features (F) of each node:  1433
The number of classes:  7





3.2 Train/Test Data Splitting

We split train and test dataset in ratio of 7:3, so the total number of dataset is 1895 for training, and 813 for testing.

In [25]:
# index of node for train model
train_mask = data_mask[0]

# index of node for test model
test_mask = data_mask[1]
In [26]:
print("The number of trainig data: ", np.sum(train_mask))
print("The number of test data: ", np.sum(test_mask))
The number of trainig data:  1895
The number of test data:  813

3.3 Initializing Graph G

In [27]:
G = nx.Graph(name = 'Cora')
G.add_nodes_from(nodes)
G.add_edges_from(edge_list)

print('Graph info: ', nx.info(G))
Graph info:  Graph named 'Cora' with 2708 nodes and 5278 edges

3.4 Construct and Normalize Adjacency Matrix A

3.4.1. Normalizing term , $\tilde D^{-1/2} (A+I) \tilde D^{-1/2}$

In [28]:
from scipy.linalg import fractional_matrix_power

A = nx.adjacency_matrix(G)

I = np.eye(A.shape[-1])
A_self = A + I

D = np.diag(np.array(A_self.sum(1)).flatten())
D_half_norm = fractional_matrix_power(D, -0.5)
    
A_half_norm = D_half_norm * A_self * D_half_norm

A_half_norm = np.array(A_half_norm)
H = np.array(H)

3.5 GCN Model

In [29]:
H_in = tf.keras.layers.Input(shape = (F, ))
A_in = tf.keras.layers.Input(shape = (N, ))

graph_conv_1 = spektral.layers.GraphConv(channels = 16,
                                         activation = 'relu')([H_in, A_in])

graph_conv_2 = spektral.layers.GraphConv(channels = 7,
                                         activation = 'softmax')([graph_conv_1, A_in])

model = tf.keras.models.Model(inputs = [H_in, A_in], outputs = graph_conv_2)

model.compile(optimizer = tf.keras.optimizers.Adam(learning_rate = 1e-2),
              loss = 'categorical_crossentropy',
              weighted_metrics = ['acc'])

model.summary()
Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            [(None, 1433)]       0                                            
__________________________________________________________________________________________________
input_2 (InputLayer)            [(None, 2708)]       0                                            
__________________________________________________________________________________________________
graph_conv (GraphConv)          (None, 16)           22944       input_1[0][0]                    
                                                                 input_2[0][0]                    
__________________________________________________________________________________________________
graph_conv_1 (GraphConv)        (None, 7)            119         graph_conv[0][0]                 
                                                                 input_2[0][0]                    
==================================================================================================
Total params: 23,063
Trainable params: 23,063
Non-trainable params: 0
__________________________________________________________________________________________________

3.6 Train Model

In [30]:
model.fit([H, A_half_norm],
          labels_encoded,
          sample_weight = train_mask,
          epochs = 30,
          batch_size = N,
          shuffle = False)
Epoch 1/30
1/1 [==============================] - 0s 997us/step - loss: 1.3631 - acc: 0.1430
Epoch 2/30
1/1 [==============================] - 0s 994us/step - loss: 1.2854 - acc: 0.4153
Epoch 3/30
1/1 [==============================] - 0s 0s/step - loss: 1.1984 - acc: 0.5894
Epoch 4/30
1/1 [==============================] - 0s 998us/step - loss: 1.1052 - acc: 0.6570
Epoch 5/30
1/1 [==============================] - 0s 998us/step - loss: 1.0199 - acc: 0.6902
Epoch 6/30
1/1 [==============================] - 0s 1ms/step - loss: 0.9398 - acc: 0.7103
Epoch 7/30
1/1 [==============================] - 0s 998us/step - loss: 0.8625 - acc: 0.7193
Epoch 8/30
1/1 [==============================] - 0s 0s/step - loss: 0.7886 - acc: 0.7303
Epoch 9/30
1/1 [==============================] - 0s 998us/step - loss: 0.7194 - acc: 0.7499
Epoch 10/30
1/1 [==============================] - 0s 997us/step - loss: 0.6549 - acc: 0.7683
Epoch 11/30
1/1 [==============================] - 0s 997us/step - loss: 0.5948 - acc: 0.7963
Epoch 12/30
1/1 [==============================] - 0s 997us/step - loss: 0.5394 - acc: 0.8301
Epoch 13/30
1/1 [==============================] - 0s 998us/step - loss: 0.4892 - acc: 0.8586
Epoch 14/30
1/1 [==============================] - 0s 997us/step - loss: 0.4444 - acc: 0.8786
Epoch 15/30
1/1 [==============================] - 0s 987us/step - loss: 0.4049 - acc: 0.8860
Epoch 16/30
1/1 [==============================] - 0s 997us/step - loss: 0.3703 - acc: 0.8950
Epoch 17/30
1/1 [==============================] - 0s 996us/step - loss: 0.3398 - acc: 0.8992
Epoch 18/30
1/1 [==============================] - 0s 998us/step - loss: 0.3131 - acc: 0.9018
Epoch 19/30
1/1 [==============================] - 0s 0s/step - loss: 0.2898 - acc: 0.9055
Epoch 20/30
1/1 [==============================] - 0s 0s/step - loss: 0.2696 - acc: 0.9119
Epoch 21/30
1/1 [==============================] - 0s 0s/step - loss: 0.2521 - acc: 0.9161
Epoch 22/30
1/1 [==============================] - 0s 998us/step - loss: 0.2369 - acc: 0.9187
Epoch 23/30
1/1 [==============================] - 0s 0s/step - loss: 0.2235 - acc: 0.9193
Epoch 24/30
1/1 [==============================] - 0s 997us/step - loss: 0.2118 - acc: 0.9203
Epoch 25/30
1/1 [==============================] - 0s 997us/step - loss: 0.2015 - acc: 0.9208
Epoch 26/30
1/1 [==============================] - 0s 0s/step - loss: 0.1923 - acc: 0.9219
Epoch 27/30
1/1 [==============================] - 0s 998us/step - loss: 0.1841 - acc: 0.9214
Epoch 28/30
1/1 [==============================] - 0s 0s/step - loss: 0.1767 - acc: 0.9230
Epoch 29/30
1/1 [==============================] - 0s 0s/step - loss: 0.1701 - acc: 0.9240
Epoch 30/30
1/1 [==============================] - 0s 997us/step - loss: 0.1640 - acc: 0.9266
Out[30]:
<tensorflow.python.keras.callbacks.History at 0x18edc562708>

3.7 Model Evaluation

In [31]:
y_pred = model.evaluate([H, A_half_norm],
                        labels_encoded,
                        sample_weight = test_mask,
                        batch_size = N)
1/1 [==============================] - 0s 997us/step - loss: 0.0963 - acc: 0.8954

3.8 T-SNE

In [32]:
from sklearn.manifold import TSNE

layer_outputs = [layer.output for layer in model.layers]
activation_model = tf.keras.models.Model(inputs = model.input, outputs = layer_outputs)
activations = activation_model.predict([H,A_half_norm],batch_size = N)

x_tsne = TSNE(n_components = 2).fit_transform(activations[2]) 
In [33]:
def plot_tSNE(labels_encoded,x_tsne):
    color_map = np.argmax(labels_encoded, axis = 1)
    plt.figure(figsize = (10,10))
    for cl in range(num_classes):
        indices = np.where(color_map == cl)
        indices = indices[0]
        plt.scatter(x_tsne[indices,0], x_tsne[indices, 1], label = cl)
    plt.legend()
    plt.show()
    
plot_tSNE(labels_encoded,x_tsne)

4. Useful Resources for Further Study

In [34]:
%%html 
<center><iframe src="https://www.youtube.com/embed/fOctJB4kVlM?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [35]:
%%html 
<center><iframe src="https://www.youtube.com/embed/ABCGCf8cJOE?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [36]:
%%html 
<center><iframe src="https://www.youtube.com/embed/0YLZXjMHA-8?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [37]:
%%html 
<center><iframe src="https://www.youtube.com/embed/ex2qllcVneY?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [38]:
%%html 
<center><iframe src="https://www.youtube.com/embed/YL1jGgcY78U?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [39]:
%%html 
<center><iframe src="https://www.youtube.com/embed/8owQBFAHw7E?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [40]:
%%html 
<center><iframe src="https://www.youtube.com/embed/R67-JxtOQzg?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [41]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')