By Prof. Seungchul Lee
http://iai.postech.ac.kr/
Industrial AI Lab at POSTECH

# 1. Degree and Importance of Network¶

## 1.1. Degree¶

Degree of Undirected Graph

• The degree of vertex in a graph is the number of edges connected to it
• denote the degree of vertex $i$ by $k_{i}$
• for an undirected graph of $n$ vetices

$$k_i = \sum_{j=1}^{n} \; A_{ij}$$

• Every edge in an undirected graph has two ends and if there are $m$ edges

$$2m = \sum_{i=1}^{n} \; k_i$$

• the mean degree $c$ of a vertex

$$c = \frac{1}{n} \sum_{i=1}^{n} \; k_i$$

• the maximum possible number of edges in a graph is $\left( \begin{array}{c} n\\2 \end{array} \right) = \frac{n(n-1)}{2}$

• density $\rho$ of a graph is the fraction of these edges that are actually present

$$\rho = \frac{m}{\left( \begin{array}{c} n\\2 \end{array} \right)}$$

Degree of Directed Graph

• in-degree: the number of incoming edges connected to a vertex

$$k_i^{\text{in}} = \sum_{j=1}^{n} \; A_{ij}$$

• out-degree: the number of outgoing edges

$$k_i^{\text{out}} = \sum_{i=1}^{n} \; A_{ij}$$

## 1.2. Centrality¶

• Which are the most important or central vertices in a network?
• degree centrality is a simple centrality measure
• in a social network, it seems reasonable to suppose that inidividuals who have connections to many others might have more influence, more access to information

Eigenvector centrality

• not all neighbors are equivalent
• increase importance by having connections to other vertices that are themselves important

• sum of the centrality (ranks, scores) of its in-comming neighbors

• iteratively update

\begin{align*} r_i \leftarrow \sum_{j \, \in \, N(i)} r_j = \sum_{j} A_{ji}r_j \end{align*}

• matrix form with a row vector of $r$

\begin{align*} &1) \quad r \leftarrow r\,A &\text{update}\\ &2) \quad r \leftarrow \frac{r}{\parallel r \parallel_1}& \text{normalize} \end{align*}

• left eigenvector of the adjacency matrix A associated with the largest eigenvalue

$$\lambda \, r = r\,A$$

# 2. Google PageRank¶

• Source
• Networks: Friends, Money, and Bytes
• by Prof. Mung Chiang at Princeton University
• Consider World Wide Web as a graph (or network)

• Question: which page is more important?

• For Google, which page among google search results should be placed on top?

## 2.1. PageRank¶

$$A = \begin{bmatrix} 0 & 0 & 1 & 0\\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1\\ 1 & 1 & 1 & 0 \end{bmatrix}$$

• In-degree
• What makes a page important?
• Possible measure
• In-degree: measure of the number of incoming links a node has ($1,1,3,1$)
• Does this tell the whole story?

• Important score
$$\pi = \left[\pi_1,\, \pi_2,\, \pi_3,\, \pi_4 \right]$$
• Importance score, $\pi_i$ for node $i$

• each node...
• has its own importance score
• Spreads an equal amount of its importance to each outgoing link
• row vector by convention

• Out-degree: the number of outgoing links from a node $\to (1,1,1,3)$

• Markov chain with transition probability matrix $H$

$$H = D^{-1}A = \begin{bmatrix} 1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & \frac{1}{3} \end{bmatrix} \begin{bmatrix} 0 & 0 & 1 & 0\\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1\\ 1 & 1 & 1 & 0 \end{bmatrix} = \begin{bmatrix} 0 & 0 & \frac{1}{1} & 0\\ 0 & 0 & \frac{1}{1} & 0\\ 0 & 0 & 0 & \frac{1}{1}\\ \frac{1}{3} & \frac{1}{3} & \frac{1}{3} & 0 \end{bmatrix}$$

• Simultaneous equations
• We can write each node's score in terms of its incoming links
• Recursion: seemingly circular logic

\begin{align*} \pi_1 &= \frac{\pi_4}{3}\\ \pi_2 &= \frac{\pi_4}{3}\\ \pi_3 &= \pi_1 + \pi_2 + \frac{\pi_4}{3}\\ \pi_4 &= \pi_3 \end{align*}




$$H = \begin{bmatrix} 0 & 0 & \frac{1}{1} & 0\\ 0 & 0 & \frac{1}{1} & 0\\ 0 & 0 & 0 & \frac{1}{1}\\ \frac{1}{3} & \frac{1}{3} & \frac{1}{3} & 0 \end{bmatrix}$$

• or eigenvalue problem (transposed)

$$\left[\pi_1,\, \pi_2,\, \pi_3,\, \pi_4 \right] \times \begin{bmatrix} 0 & 0 & \frac{1}{1} & 0\\ 0 & 0 & \frac{1}{1} & 0\\ 0 & 0 & 0 & \frac{1}{1}\\ \frac{1}{3} & \frac{1}{3} & \frac{1}{3} & 0 \end{bmatrix} = 1\left[\pi_1,\, \pi_2,\, \pi_3,\, \pi_4 \right]$$

In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
A = np.matrix([[0, 0, 1, 0],
[0, 0, 1, 0],
[0, 0, 0, 1],
[1, 1, 1, 0]])

d = np.diag([1, 1, 1, 3])
d = np.asmatrix(d)

H = d.I*A

In [3]:
# eigenvalue

[D, V] = np.linalg.eig(H.T)

print(D) # check one of eigenvalue = 1
print(V[:,0]/np.sum(V[:,0]))

[ 1.0000000e+00+0.j         -5.0000000e-01+0.64549722j
-5.0000000e-01-0.64549722j  6.2307862e-35+0.j        ]
[[0.125-0.j]
[0.125-0.j]
[0.375-0.j]
[0.375-0.j]]

In [4]:
# iterative

# random initialization
r = np.random.rand(1,4)
r = r/np.sum(r)
print(r)

[[0.30686635 0.39529553 0.10359321 0.1942449 ]]

In [5]:
for _ in range(100):
r = r*H

print(r)

[[0.125 0.125 0.375 0.375]]

In [6]:
r = np.random.rand(1,4)
r = r/np.sum(r)

r_pre = 1/4*np.ones([1,4])

while np.linalg.norm(r_pre - r) > 1e-10:
r_pre = r
r = r*H

print(r)

[[0.125 0.125 0.375 0.375]]

• Dangling nodes
• Does what we just presented always lead to a unique solution?
• Not quite yet !

• Dangling nodes
• nodes that do not point to any others
• lead to no meaningful solution
• Solution: assume dangling node points to every node in the graph ($\rightarrow$ random surfer)
• The Random Surfer
• Surfing the web at random
• Two possible actions
• pick a hyperlink randomly from the current page
• enter a URL randomly
• Stochastic matrix $H'$

$$H' = H + \frac{1}{N}d \mathbf{1}^T \qquad$$

$\qquad \qquad$where $d$ is dangling (absorbing) nodes indicator column vector and $\mathbb{1}$ is a unit column vector

$$H' = \begin{bmatrix} 0 & 0 & 0 & 1\\ \frac{1}{3} & 0 & \frac{1}{3} & \frac{1}{3}\\ \frac{1}{3} & \frac{1}{3} & 0 & \frac{1}{3}\\ 0 & 0 & 0 & 0 \end{bmatrix} + \begin{bmatrix} 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0\\ \frac{1}{4} & \frac{1}{4} & \frac{1}{4} & \frac{1}{4} \end{bmatrix} = H + \frac{1}{4} \begin{bmatrix} 0\\0\\0\\1\end{bmatrix} \begin{bmatrix} 1 & 1 & 1 &1 \end{bmatrix}$$

In [7]:
A = np.matrix([[0, 0, 0, 1],
[1, 0, 1, 1],
[1, 1, 0, 1],
[0, 0, 0, 0]])

s = np.sum(A, axis = 1)

H2 = np.zeros(A.shape)

for i in range(4):
if s[i,0] != 0:
H2[i,:] = A[i,:]/s[i,0]
else:
H2[i,:] = 1/4*np.ones([1,4])

H2 = np.asmatrix(H2)
print(H2)

[[0.         0.         0.         1.        ]
[0.33333333 0.         0.33333333 0.33333333]
[0.33333333 0.33333333 0.         0.33333333]
[0.25       0.25       0.25       0.25      ]]

In [8]:
r = np.random.rand(1,4)
r = r/np.sum(r)

r_pre = 1/4*np.ones([1,4])

while np.linalg.norm(r_pre - r) > 1e-10:
r_pre = r
r = r*H2

print(r)

[[0.22222222 0.16666667 0.16666667 0.44444444]]

• Disconnected graph

• Connected component - group of nodes which can reach eath other, but none outside of the group
• More than one connected component

• Infinitely many solutions
• Stuck in subgraph forever
• Solution: add the purely random surfing into the mix
• you get bored, enter a URL randomly
• PageRank: what is the chance a page is selected at any time?

• Google matrix $G$

$$G = \alpha \begin{bmatrix} \frac{1}{2} & \frac{1}{2} & 0 & 0\\ \frac{1}{2} & \frac{1}{2} & 0 & 0\\ 0 & 0 & \frac{1}{2} & \frac{1}{2}\\ 0 & 0 & \frac{1}{2} & \frac{1}{2} \end{bmatrix} + (1-\alpha) \frac{1}{4}\begin{bmatrix} 1 & 1 & 1 & 1\\ 1 & 1 & 1 & 1\\ 1 & 1 & 1 & 1\\ 1 & 1 & 1 & 1 \end{bmatrix}$$

$$G = \alpha H' + (1-\alpha)\frac{1}{N}\mathbf{1}\mathbf{1}^T$$

In [9]:
A = np.matrix([[1, 1, 0, 0],
[1, 1, 0, 0],
[0, 0, 1, 1],
[0, 0, 1, 1]])

s = np.sum(A, axis = 1)

H2 = np.zeros(A.shape)

for i in range(4):
if s[i,0] != 0:
H2[i,:] = A[i,:]/s[i,0]
else:
H2[i,:] = 1/4*np.ones([1,4])

H2 = np.asmatrix(H2)
print(H2, '\n')

alpha = 0.85;
G = alpha*H2 + (1-alpha)*1/4*np.ones([4,4])
G = np.asmatrix(G)

print(G)

[[0.5 0.5 0.  0. ]
[0.5 0.5 0.  0. ]
[0.  0.  0.5 0.5]
[0.  0.  0.5 0.5]]

[[0.4625 0.4625 0.0375 0.0375]
[0.4625 0.4625 0.0375 0.0375]
[0.0375 0.0375 0.4625 0.4625]
[0.0375 0.0375 0.4625 0.4625]]

In [10]:
r = np.random.rand(1,4)
r = r/np.sum(r)

r_pre = 1/4*np.ones([1,4])

while np.linalg.norm(r_pre - r) > 1e-10:
r_pre = r
r = r*G

print(r)

[[0.25 0.25 0.25 0.25]]


## 2.2. PageRank Computations¶

1. Eigenvalue problem (choose left eigenvector $\pi$ with $\lambda = 1$)

$$\pi G = 1 \pi \implies \pi = \alpha \pi H' + (1-\alpha)\frac{1}{N}\mathbf{1}^T$$

where $\pi$ is stationary distribution of Markov chain, and raw vector by convention

2. Power iterations:

$$\pi \leftarrow \pi \left(\alpha H' + (1-\alpha)\frac{1}{N}\mathbf{1}\mathbf{1}^T \right) = \alpha \pi H' + (1-\alpha)\frac{1}{N}\mathbf{1}^T$$

## 2.3. Summary¶

\begin{align*} H & &\text{from webgraph}\\ H'& = H + \frac{1}{N}d \mathbf{1}^T &\text{to overcome dangling issue}\\ G &= \alpha H' + (1-\alpha)\frac{1}{N}\mathbf{1}\mathbf{1}^T= \alpha \left( H + \frac{1}{N}d \mathbf{1}^T\right) + (1-\alpha)\frac{1}{N}\mathbf{1}\mathbf{1}^T &\text{to overcome disconnected subgraph} \end{align*}

$$\pi [k+1] = \pi [k] G$$

In [11]:
%%javascript
\$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')