Skip to content

Rungang/HLloyd

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HLloyd

This is the R package for paper: "Exact Clustering in Tensor Block Model: Statistical Optimality and Computational Limit" by Rungang Han, Yuetian Luo, Miaoyan Wang and Anru Zhang (2020).

Update (2023.9): The “kmeanspp()” function used in HOLloyd.R is removed from the current version of the “pracma” package. To replicate the result, please download the version 1.5.5. of the “pracma” package from the website (https://cran.r-project.org/src/contrib/Archive/pracma/) to the local and install the local package using the following code: install.packages(“pracma”, lib= ‘/your-R-package-lib’)

Instructions

HLloyd requires the following packages for full functionality: 'rTensor', 'gtools', 'LICORS', 'mclust'. Use the following commands for installation:

library(devtools)
devtools::install_github("Rungang/HLloyd")

See example.R for illustration.

Please contact hrg.stat@gmail.com with any questions or comments.

Citation

@article{han2020exact, title={Exact Clustering in Tensor Block Model: Statistical Optimality and Computational Limit}, author={Han, Rungang and Luo, Yuetian and Wang, Miaoyan and Zhang, Anru R}, journal={arXiv preprint arXiv:2012.09996}, year={2020} }

Reproducibility

The code for obtaining numeric results are collected in the /experiment subfolder.

  • All main functions are in HOLloyd.R

  • Simulation codes in Section 6.1:

    • matrix_phase_transition.R and tensor_phase_transition.R: Statistical and Computational Phase Transition.
    • different_delta.R: Impact of Initialization to HLloyd.
    • tensor_recovery.R: Clustering via HLloyd + HSC algorithms.
    • tensor_recovery.R: Tensor estimation via HLloyd + HSC algorithms.
    • bernoulli_model_clustering_property.R: Simulations on Stochastic Tensor Block Models.
  • Simultion codes in Section 6.2:

    • TC_methods_compare.R, TC_methods_compare_diffr.R, TC_methods_compare_imbalance.R: Three comparisons with baseline methods.
  • Real data in Section 7.1 (flight route network)

    • Data files: flight_route.RData, US_flight_route.RData
    • code files: airline_clustering.R and US_airline_clustering.R
    • Notice: In the simulatin code, the 'delta' parameter in the code relates to the 'gamma' parameter in the paper with relation '-2 delta = gamma'
  • Real data in Section 7.2 (Online Click-through Data)

    • Data file: Click_through.csv
    • code files: click_through.R
    • Notice: In the data file, the four columns represent the user ID, item ID, timepoint (1 - 24h) and date (1 - 8 day).

About

High-order Lloyd algorithm for multiway clustering of tensor data.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages