/
homework_5_GraphPartitioning.Rmd
79 lines (54 loc) · 4.08 KB
/
homework_5_GraphPartitioning.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
---
title: 'Homework #5'
subtitle: 'Clustering of Network data'
author: 'MAP 573'
date: "11/03/2020"
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(
cache = TRUE,
echo = TRUE,
rows.print = 5)
```
## Preliminaries
### Information
Homework is due Sunday 11/08 23:59 in Rmd (see assignment in Moodle).
### Package requirements
Check that the following packages are available on your computer:
```{r packages, message = FALSE, warning = FALSE}
library(igraph)
library(missSBM)
library(corrplot)
library(aricode)
```
## Introduction to **igraph**
### Tutorials
Have a glance at these two tutorials
- [an **igraph** tutorial](https://rstudio-pubs-static.s3.amazonaws.com/74248_3bd99f966ed94a91b36d39d8f21e3dc3.html) for graphs manipulation.
- [network analysis with **igraph**](http://kateto.net/networks-r-igraph) that gives an overview of the standard features of **igraph** to perform basic statistical analysis on network graphs.
## Analysis of the French political Blogs in 2006
Load the data set from the missSBM package
```{r}
missSBM::frenchblog2007
```
Plot this graph and quickly comment.
### Graph Partitioning
It is time to find some subtle structure in this network by performing graph clustering of the nodes. A commonly sought structure is the presence of _communities_, like clusters of friends in social networks.
The two methods explored today for community detection do not assume any underlying model on the graph. The first one is based on a generalization of hierarchical clustering for graphs using the concept of _modularity_ to define an appropriate dissimilarity measure between clusters. The second is the normalized spectral clustering.
#### Hierarchical clustering with modularity
Just like in usual hierarchical clustering, we need some cost function to choose which clusters are fused during construction of the hierarchy. Let $\mathcal{C} = \{C_1,\dots,C_K\}$ be a candidate partition and define $f_{ij}(\mathcal{C})$ to be the fraction of edges in the original network that connect vertices in $C_i$ with vertices in $C_j$. The modularity of $\mathcal{C}$ is the value
\begin{equation*}
\mathrm{modularity}(\mathcal{C}) = \sum_{k=1}^K \left(f_{kk}(\mathcal{C}) - f_{kk}^\star\right)^2
\end{equation*}
where $f_{kk}^\star$ is the expected value of $f_{kk}$ under some model of random edge assignment. The `igraph::fastgreedy.community` function performs and approximated optimization of the modularity measure.
Use this function to extract a possible clustering of the nodes for the French blogosophere. Use the plot function for object with class `communities` outputing from `igraph::fastgreedy.community`. Compare this clustering with the political labels of the nodes (use for instance confusion tables with `table` or adjusted Rand-Index with `aricode::ARI`).
Explore the results offered by other clustering methods for community detection (e.g. `igraph::cluster_edge_betweenness`), or the ones obtained by a simple hierarchical clustering on dissimilarity measured on the adjacency matrix. Compare the clustering to the "ground-truth". Comment.
#### Spectral Analysis
We first used the algorithm introduced by Ng, Jordan and Weiss (2002). Check the course/slides!
1. Compute the graph Laplacian with the `igraph::graph.laplacian` function.
2. Compute its eigen values and represent the scree plot (eigen values by increasing order). Comment.
3. Compute its eigen vectors and represent the pairs plot between the first 10 eigen vectors with non-null eigen values. Add colors associated to the Political labels of the nodes. Comment.
4. Implement the Spectral clustering algorithm and apply it to the French Blogosphere for various numbers of clusters.
5. Compare your clustering to the political labels and to the one obtained by hierarchical clustering. Comment, make plots changing node colors, etc.
6. Redo the analysis with the absolute spectral clustering of Rohe et al (2011).
**Remark:** Another fancy alternative to the **igraph** package is the `corrplot` methods on the adjacency matrix of a graph once rows and columns reordered according to the clustering to represent the clustered network.