Graph Partition: Spectral Partitioning

By Prof. Seungchul Lee
http://iai.postech.ac.kr/
Industrial AI Lab at POSTECH

# 1. Graph Partitioning Algorithms¶

• Network communities are groups of vertices similar to each other
• similarity-based clustering ($\to$ inner product)
• example: you are defined by your connections (friends)
• Network communities are groups of vertices such vertices inside the group connected with many more edges than between groups
• graph partitioning (or community detection)

## 1.1. Graph partitioning¶

• the spectral partitioning
• bisection problem (dividing a graph into two parts of specified sizes)

Graph $G(E,V)$ partition: $V=V_1 + V_2$

• graph cut

$$Q=\text{cut}\left(V_1,V_2 \right) = \sum_{i\in V_1,\; j\in V_2}{e_{ij}}$$

• ratio cut

$$Q=\frac{\text{cut}\left(V_1,V_2 \right)}{\parallel V_1 \parallel} + \frac{\text{cut}\left(V_1,V_2 \right)}{\parallel V_2 \parallel}$$

In [1]:
%%html
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>

In [2]:
%%html
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>


## 1.2. Spectral partitioning¶

### 1.2.1. Graph Laplacian and Optimization¶

• indicator vector

$$s_i = \begin{cases} +1 & \quad \text{if } u_i \in V_1\\ -1 & \quad \text{if } u_i \in V_2 \end{cases}$$

• number of edges, connecting $V_1$ and $V_2$

\begin{align*} \text{cut}\left(V_1,V_2 \right) &= \frac{1}{4} \sum_{e(i,\,j)}(s_i - s_j)^2 = \frac{1}{8} \sum_{i,\,j} A_{ij}(s_i - s_j)^2 \\&= \frac{1}{8} \sum_{i,\,j} A_{ij}(s_i^2 - 2 s_i s_j + s_j^2) = \frac{1}{4} \sum_{i,\,j} A_{ij}(s_i^2 - s_i s_j) \\ & = \frac{1}{4} \sum_{i} \sum_{j} A_{ij} s_i^2 - \frac{1}{4} \sum_{i,\,j}A_{ij}s_i s_j = \frac{1}{4} \sum_{i} \left(\sum_{j} A_{ij}\right) s_i^2 - \frac{1}{4} \sum_{i,\,j}A_{ij}s_i s_j \\&= \frac{1}{4} \sum_{i} k_i s_i^2 - \frac{1}{4} \sum_{i,\,j}A_{ij}s_i s_j\\ & = \frac{1}{4} \sum_{i,\,j} \left(k_i \delta_{ij}s_i^2 - A_{ij}s_i s_j\right) = \frac{1}{4} \sum_{i,\,j} \left( k_i \delta_{ij} - A_{ij}\right)s_i s_j = \frac{1}{4} \sum_{i,\,j} \left( D_{ij} - A_{ij}\right)s_i s_j \end{align*}

• Graph Laplacian: $L_{ij} = D_{ij} - A_{ij}$, where $D_{ij} = \text{diag}(k_i)$

$$L_{ij} = \begin{cases} \;\; k_i &\quad \text{if } i = j\\ -1 &\quad \text{if } \exists \; e(i,j)\\ \;\; 0 &\quad \text{otherwise}\end{cases}$$

• matrix form

$$Q(s) = \frac{1}{4} s^T L s$$

• optimization

\begin{align*} \min_s \quad & Q(s) \\ \text{subject to } & \sum_i s_i = 0 \qquad (\Rightarrow \text{balanced cut constraint})\\ & s_i \in \{+1,-1\} \end{align*}

• integer minimizaiton problem, but exact solution is NP-hard

### 1.2.2. Relaxed spectral method¶

• $\text{discrete } s_i \rightarrow \text{continuous } x_i$

 \begin{align*} \min_x \quad & \;\frac{1}{4} x^T L x \quad = \min_x \; x^T L x\\ \text{subject to } & \sum_i x_i = 0\\ & \sum_i x_i^2 = n \end{align*} $\implies$ \begin{align*} \min_x \quad & \; x^T L x\\ \text{subject to } & x^T \mathbb{1} = 0\\ & x^Tx = n \end{align*} $\implies$ \begin{align*} \min_x \quad & \; \left(\frac{x}{\sqrt{n}}\right)^T L \left(\frac{x}{\sqrt{n}}\right)\\ \text{subject to } & \left(\frac{x}{\sqrt{n}}\right)^T \mathbb{1} = 0\\ & \left(\frac{x}{\sqrt{n}}\right)^T\left(\frac{x}{\sqrt{n}}\right) = 1 \end{align*}

 $$\frac{x}{\sqrt{n}} \rightarrow x$$ $\implies$ \begin{align*} \min_x \quad & \; x^T L x\\ \text{subject to } & x^T \mathbb{1} = 0\\ & x^Tx = 1 \end{align*} $\implies$ \begin{align*} \min_x \quad & \; \frac{x^T L x}{x^Tx}\\ \text{subject to } & x^T \mathbb{1} = 0\\ \end{align*}

• given $x_i$, round them up by $s_i = \text{sign}(x_i)$

Solution

• Eigenvalue problem

$$Lx = \lambda x, \quad x \perp \mathbb{1}$$

\begin{align*} \min_x \quad & \; \frac{x^T L x}{x^Tx} = \min_x \; \lambda\\ \\ \text{subject to } & x^T \mathbb{1} = 0\\ \end{align*}

• First (smallest) eigenvector $=\mathbb{1}$

$$L\mathbb{1} = 0 = 0\mathbb{1} \; \implies \; \lambda = 0, \quad x_1 = \mathbb{1}$$

• Looking for the second smallest eigenvalue/eigenvector $\lambda_2$ and $x_2$

$$x_2 \perp x_1 = \mathbb{1}$$

Solution using Lagrange multipliers (optional)

$$Q(x) = \frac{1}{4} x^T L x - \lambda (x^Tx-n), \quad x^T \mathbb{1} = 0$$

• Eigenvalue problem

$$Lx = \lambda x, \quad x \perp \mathbb{1}$$

• Solution

$$Q(x) = \frac{n}{4} \lambda$$

• First (smallest) eigenvector $=\mathbb{1}$

$$L\mathbb{1} = 0, \quad \lambda = 0, \quad x_1 = \mathbb{1}$$

• Looking for the second smallest eigenvalue/eigenvector $\lambda_2$ and $x_2$

$$x_2 \perp x_1 = \mathbb{1}$$

# 2. Example 1¶

In [1]:
A = [0 0 0 1 1 0 1 0;
0 0 0 1 0 1 0 1;
0 0 0 0 1 0 1 0;
1 1 0 0 0 1 0 0;
1 0 1 0 0 0 1 0;
0 1 0 1 0 0 0 1;
1 0 1 0 1 0 0 0;
0 1 0 0 0 1 0 0];

L = diag(sum(A,1)) - A;
[V,D] = eig(L);
diag(D)
[ds,Y] = sort(diag(D));
u2 = V(:,Y(2))
[v2,I] = sort(u2,'ascend');

Out[1]:
ans =

-0.0000
0.2907
2.0000
2.8061
4.0000
4.0000
4.0000
4.9032

u2 =

-0.1993
0.3696
-0.4325
0.1993
-0.3696
0.3696
-0.3696
0.4325
In [5]:
%plot -s 700,300
subplot(1,2,1), plot(1:8,u2,'.'), xlim([0 9])
subplot(1,2,2), plot(1:8,v2,'.'), xlim([0 9])

Out[5]:
In [6]:
A
B = A(I,I)

I

Out[6]:
A =

0     0     0     1     1     0     1     0
0     0     0     1     0     1     0     1
0     0     0     0     1     0     1     0
1     1     0     0     0     1     0     0
1     0     1     0     0     0     1     0
0     1     0     1     0     0     0     1
1     0     1     0     1     0     0     0
0     1     0     0     0     1     0     0

B =

0     1     1     0     0     0     0     0
1     0     1     1     0     0     0     0
1     1     0     1     0     0     0     0
0     1     1     0     1     0     0     0
0     0     0     1     0     1     1     0
0     0     0     0     1     0     1     1
0     0     0     0     1     1     0     1
0     0     0     0     0     1     1     0

I =

3
5
7
1
4
6
2
8

In [7]:
%plot -s 700,300
subplot(1,2,1), spy(A)
subplot(1,2,2), spy(B)

Out[7]:

# 3. Example 2¶

In [8]:
%plot -s 700,300
A = [0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0;
0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 0;
0 1 0 1 1 1 0 1 0 0 0 0 0 0 0 0;
0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0;
1 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0;
0 0 1 0 1 0 1 0 0 1 0 0 0 0 0 0;
1 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0;
0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0;
0 1 0 0 0 0 0 0 0 1 1 1 1 0 1 0;
0 0 0 0 0 1 0 1 1 0 1 0 0 0 1 0;
0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0;
0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0;
0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0;
0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0;
0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 1;
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0;
];

L = diag(sum(A,1)) - A;
[V,D] = eig(L);

u2 = V(:,2);
[v2,I] = sort(u2,'ascend');

subplot(1,2,1), plot(1:16,u2,'.'), xlim([0 17])
subplot(1,2,2), plot(1:16,v2,'.'), xlim([0 17])

Out[8]:
In [32]:
spy(A(I,I))

S = 1:16;
S(u2<0)

Out[32]:
ans =

9    10    11    12    13    14    15    16

# 4. Example 3: Zachary Karate Club¶

• Zachary's Karate Club is a well-known social network of a university karate club described in "An Information Flow Model for Conflict and Fission in Small Groups" paper by Wayne W. Zachary.
• A social network of a karate club was studied by Wayne W. Zachary for a period of three years from 1970 to 1972. The network captures 34 members of a karate club, documenting 78 pairwise links between members who interacted outside the club. During the study a conflict arose between the administrator "John A" and instructor "Mr. Hi" (pseudonyms), which led to the split of the club into two. Half of the members formed a new club around Mr. Hi, members from the other part found a new instructor or gave up karate. Basing on collected data Zachary assigned correctly all but one member of the club to the groups they actually joined after the split.

• note: node labels start from 0
In [11]:
%plot -s 560,420
spy(A)

Out[11]:
In [12]:
L = diag(sum(A,1)) - A;
[V,D] = eig(L);
plot(1:34,diag(D),'.'), title('eigenvalues')

Out[12]:
In [13]:
%plot -s 700,300
u2 = V(:,2);
[v2,I] = sort(u2,'ascend');

subplot(1,2,1), plot(1:34,u2,'.'), title('2nd smallest eigenvector','fontsize',8), xlim([0 35])
subplot(1,2,2), plot(1:34,v2,'.'), title('sorted','fontsize',8), xlim([0 35])

Out[13]:
In [15]:
%plot -s 700,300
subplot(1,2,1), spy(A)
subplot(1,