- ELBMcoclust Overview
- Datasets
- Models
- Confusion Matrices
- Visualization
- Word Cloud of PoissonSELBM for Classic3
- Main Contributions
- Cite
- Highlights
- Supplementary Materials
- Data Availability
- Presentation Video
- References
Sparse and Non-Sparse Exponential Family Latent Block Model for Co-clustering
The goal of the statistical approach is to analyze the behavior of the data by considering the probability distribution. The complete log-likelihood function for three version of LBM, Exponential LBM and Sparse Exponential LBM, will be as follows:
- LBM
- ELBM
- SELBM
Datasets | Topics | #Classes | (#Documents, #Words) | Sparsity(%0) | Balance |
---|---|---|---|---|---|
Classic3 | Medical, Information retrieval, Aeronautical systems | 3 | (3891, 4303) | 98.95 | 0.71 |
CSTR | Robotics/Vision, Systems, Natural Language Processing, Theory | 4 | (475, 1000) | 96.60 | 0.399 |
WebACE | 20 different topics from WebACE project | 20 | (2340, 1000) | 91.83 | 0.169 |
Reviews | Food, Music, Movies, Radio, Restaurants | 5 | (4069, 18483) | 98.99 | 0.099 |
Sports | Baseball, Basketball, Bicycling, Boxing, Football, Golfing, Hockey | 7 | (8580, 14870) | 99.14 | 0.036 |
TDT2 | 30 different topics | 30 | (9394, 36771) | 99.64 | 0.028 |
- Balance: (#documents in the smallest class)/(#documents in the largest class)
from ELBMcoclust.Models.coclust_ELBMcem import CoclustELBMcem
from ELBMcoclust.Models.coclust_SELBMcem import CoclustSELBMcem
from NMTFcoclust.Evaluation.EV import Process_EV
ELBM = CoclustELBMcem(n_row_clusters = 4, n_col_clusters = 4, model = "Poisson")
ELBM.fit(X_CSTR)
SELBM = CoclustSELBMcem(n_row_clusters = 4, n_col_clusters = 4, model = "Poisson")
SELBM.fit(X_CSTR)
Process_Ev = Process_EV(true_labels ,X_CSTR, ELBM)
from sklearn.metrics import confusion_matrix
confusion_matrix(true_labels, np.sort(ELBM.row_labels_))
array([[101, 0, 0, 0],
[ 4, 52, 15, 0],
[ 0, 0, 178, 0],
[ 0, 0, 34, 91]], dtype=int64)
In this paper, we provide a summary of the main contributions:
-
Exponential family Latent Block Model (ELBM) and Sparse version (SELBM): We propose these models, which unify many leading algorithms suited to various data types.
-
Classification Expectation Maximization Approach: Our proposed algorithms use this approach and have a general framework based on matrix form.
-
Focus on Document-Word Matrices: While we propose a flexible matrix formalism for different models according to different distributions, we focus on document-word matrices in this work. We evaluate ELBMs and SELBMs using six real document-word matrices and three synthetic datasets.
Please cite the following paper in your publication if you are using ELBMcoclust
in your research:
@article{ELBMcoclust,
title={Sparse Exponential Family Latent Block Model for Co-clustering},
Journal={Submitted}
authors={Saeid Hoseinipour, Mina Aminghafari, Adel Mohammadpour, Mohamed Nadif},
year={2024}
}
- Exponential family Latent Block Model (ELBM) and Sparse version (SELBM) were proposed, which unify many models with various data types.
- The proposed algorithms using the classification expectation maximization approach have a general framework based on matrix form.
- Using six real document-word matrices and three synthetic datasets (Bernoulli, Poisson, Gaussian), we compared ELBM with SELBM.
- All datasets and algorithm codes are available on GitHub as
ELBMcoclust
repository.
- More details about the Classic3 real-text dataset are available here.
- For additional visualization, see here.
The code of algorithms, all datasets, additional visualizations, and materials are available at ELBMcoclust
repository. Our experiments were performed on a PC (Intel(R), Core(TM) i7-10510U, 2.30 GHz), and all figures were produced in Python using the Seaborn and Matplotlib libraries.
[1] Govaert and Nadif, Clustering with block mixture models, Pattern Recognition (2013).