`ELBMcoclust` and `SELBMcoclust`

Sparse and Non-Sparse Exponential Family Latent Block Model for Co-clustering

The goal of the statistical approach is to analyze the behavior of the data by considering the probability distribution. The complete log-likelihood function for three version of LBM, Exponential LBM and Sparse Exponential LBM, will be as follows:

LBM

$$L^{\text{LBM}}(\mathbf{r},\mathbf{c},\boldsymbol{\gamma})= \sum\limits_{i,k}r_{ik} \log\pi_{k} +\sum\limits_{j,h} \log\rho_{h} c^{\top}_{jh}+ \sum\limits_{i,j,k,h} r_{ik}\log \varphi(x_{ij};\alpha_{kh})c^{\top}_{hj}.$$

ELBM

$$L^{\text{ELBM}}(\mathbf{r},\mathbf{c},\boldsymbol{\gamma}) \propto \sum\limits_{k} r_{.k} \log\pi_{k} + \sum\limits_{h} c_{.h} \log\rho_{h} + \text{Tr}\left( (\mathbf{R}^{\top} (\mathbf{S_{x}}\odot \hat{\boldsymbol{\beta}}) \mathbf{C})^{\top} \mathbf{A}_{\boldsymbol{\alpha}} \right) - \text{Tr}\left( (\mathbf{R}^{\top} (\mathbf{E}_{mn}\odot \hat{\boldsymbol{\beta}}) \mathbf{C})^{\top} \mathbf{F}_{\boldsymbol{\alpha}} \right).$$

SELBM

$$\begin{align*} L^{\text{SELBM}}(\mathbf{r},\mathbf{c},\boldsymbol{\gamma}) \propto& \sum\limits_{k} r_{.k} \log\pi_{k} + \sum\limits_{h} c_{.h}\log\rho_{h} + \sum\limits_{k} \left[ \mathbf{R}^{\top}(\mathbf{S_{x}}\odot \hat{\boldsymbol{\beta}})\mathbf{C} \right]_{kk} \left( A(\alpha_{kk}) - A(\alpha) \right)\nonumber\\\ &- \sum\limits_{k} [\mathbf{R}^{\top} (\mathbf{E}_{mn} \odot \hat{\boldsymbol{\beta}} )\mathbf{C}]_{kk} \left( F(A(\alpha_{kk})) -F(A(\alpha)) \right). \end{align*}$$

Datasets

Datasets	Topics	#Classes	(#Documents, #Words)	Sparsity(%0)	Balance
Classic3	Medical, Information retrieval, Aeronautical systems	3	(3891, 4303)	98.95	0.71
CSTR	Robotics/Vision, Systems, Natural Language Processing, Theory	4	(475, 1000)	96.60	0.399
WebACE	20 different topics from WebACE project	20	(2340, 1000)	91.83	0.169
Reviews	Food, Music, Movies, Radio, Restaurants	5	(4069, 18483)	98.99	0.099
Sports	Baseball, Basketball, Bicycling, Boxing, Football, Golfing, Hockey	7	(8580, 14870)	99.14	0.036
TDT2	30 different topics	30	(9394, 36771)	99.64	0.028

Balance: (#documents in the smallest class)/(#documents in the largest class)

Models

from ELBMcoclust.Models.coclust_ELBMcem import CoclustELBMcem
from ELBMcoclust.Models.coclust_SELBMcem import CoclustSELBMcem

from NMTFcoclust.Evaluation.EV import Process_EV

ELBM = CoclustELBMcem(n_row_clusters = 4, n_col_clusters = 4, model = "Poisson")
ELBM.fit(X_CSTR)

SELBM = CoclustSELBMcem(n_row_clusters = 4, n_col_clusters = 4, model = "Poisson")
SELBM.fit(X_CSTR)

Process_Ev = Process_EV(true_labels ,X_CSTR, ELBM)

from sklearn.metrics import confusion_matrix 

confusion_matrix(true_labels, np.sort(ELBM.row_labels_))


array([[101,   0,   0,   0],
       [  4,  52,  15,   0],
       [  0,   0,  178,  0],
       [  0,   0,   34, 91]], dtype=int64)

Confusion Matrices

Visualization

Word cloud of `PoissonSELBM` for Classic3

Main Contributions

In this paper, we provide a summary of the main contributions:

Exponential family Latent Block Model (ELBM) and Sparse version (SELBM): We propose these models, which unify many leading algorithms suited to various data types.
Classification Expectation Maximization Approach: Our proposed algorithms use this approach and have a general framework based on matrix form.
Focus on Document-Word Matrices: While we propose a flexible matrix formalism for different models according to different distributions, we focus on document-word matrices in this work. We evaluate ELBMs and SELBMs using six real document-word matrices and three synthetic datasets.

Cite

Please cite the following paper in your publication if you are using ELBMcoclust in your research:

 @article{ELBMcoclust, 
    title={Sparse Exponential Family Latent Block Model for Co-clustering}, 
Journal={Submitted}
  authors={Saeid Hoseinipour, Mina Aminghafari, Adel Mohammadpour, Mohamed Nadif}, 
    year={2024}
}

Highlights

Exponential family Latent Block Model (ELBM) and Sparse version (SELBM) were proposed, which unify many models with various data types.
The proposed algorithms using the classification expectation maximization approach have a general framework based on matrix form.
Using six real document-word matrices and three synthetic datasets (Bernoulli, Poisson, Gaussian), we compared ELBM with SELBM.
All datasets and algorithm codes are available on GitHub as ELBMcoclust repository.

Supplementary materials

More details about the Classic3 real-text dataset are available here.
For additional visualization, see here.

Data Availability

The code of algorithms, all datasets, additional visualizations, and materials are available at ELBMcoclust repository. Our experiments were performed on a PC (Intel(R), Core(TM) i7-10510U, 2.30 GHz), and all figures were produced in Python using the Seaborn and Matplotlib libraries.

Presentation video

References

[1] Govaert and Nadif, Clustering with block mixture models, Pattern Recognition (2013).

[2] Govaert and Nadif, Block clustering with Bernoulli mixture models: Comparison of different approaches, Computational Statistics and Data Analysis (2008).

[3] Rodolphe Priam et al, Topographic Bernoulli block mixture mapping for binary tables, Pattern Analysis and Applications (2014).

[4] Ailem, Melissa et al, Sparse Poisson latent block model for document clustering, IEEE Transactions on Knowledge and Data Engineering (2017).

[5] Fossier, Riverain et al, Semi-supervised Latent Block Model with pairwise constraints, Machine Learning (2022).

[6] Saeid, Hoseinipour et al, Orthogonal parametric non-negative matrix tri-factorization with $\alpha$-Divergence for co-clustering, Expert Systems with Applications (2023).

Name		Name	Last commit message	Last commit date
Latest commit History 405 Commits
Datasets		Datasets
Evaluation		Evaluation
Images		Images
Models		Models
Results		Results
Supplementary material		Supplementary material
Synthetic_Data		Synthetic_Data
Visualization		Visualization
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

Saeidhoseinipour/ELBMcoclust

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

ELBMcoclust and SELBMcoclust

Confusion Matrices

Visualization

Word cloud of PoissonSELBM for Classic3

Main Contributions

Cite

Highlights

Supplementary materials

Data Availability

Presentation video

References

About

Topics

Resources

License

Stars

Watchers

Forks

Languages

`ELBMcoclust` and `SELBMcoclust`

Word cloud of `PoissonSELBM` for Classic3