Skip to content

saghiles/dcc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Directional Co-clustering with a Conscience (DCC)

The code in this repository implements the DCC co-clustering algorithm presented in the paper:

Usage example

The following code is an example of how we can fit DCC to a real-world dataset and assess the quality of the obtained clustering. We assume that we have already run the scripts inside the files dcc.R and utils.R. The following packages are required:

  • R.matlab, for data reading.
  • mclust, for Adjusted Rand Index (ARI) computation.
  • Matrix, for sparse matrix representation and manipulation.
# Load NG2 datatset
ng2 <- readMat("./data/NG2.mat")
ng2_mat <- ng2$mat
ng2_class_labels <- as.vector(ng2$class.labels)


# TF-IDF representation
ng2_tfidf <- tf_idf(ng2_mat)

# Fit DCC to NG2 data
res= dcc(X=ng2_tfidf, k=2, iter.max=100, n_init=5, stoch_iter.max=70)

# Compare DCC clutering with the ground truth
NMI(res$rowcluster,ng2_class_labels,dim(ng2_tfidf)[1])
adjustedRandIndex(res$rowcluster,ng2_class_labels)

Output:

NMI = 0.714    ARI = 0.810

Results may vary slightly from one run to another due to random initialization as well as stochastic assignments in early iterations. For more details, please refer to section 6.3 in DCC paper.