Skip to content

Repository of the paper "Community Detection Methods for Multi-Label Classification" publish in BRACIS 2023

License

Notifications You must be signed in to change notification settings

cissagatto/Bracis2023

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BRACIS 2023

Repository of the paper "Community Detection Methods for Multi-Label Classification" published in BRACIS 2023.

Paper

Gatto, E.C., Valejo, A.D.B., Ferrandin, M., Cerri, R. (2023). Community Detection for Multi-label Classification. In: Naldi, M.C., Bianchi, R.A.C. (eds) Intelligent Systems. BRACIS 2023. Lecture Notes in Computer Science(), vol 14195. Springer, Cham. https://doi.org/10.1007/978-3-031-45368-7_6

Abstract

Exploring label correlations is one of the main challenges in multi-label classification. The literature shows that prediction performances can be improved when classifiers learn these correlations. On the other hand, some works also argue that the multi-label classification methods cannot explore label correlations. The traditional multi-label local approach uses only information from individual labels, which makes it impractical to find relationships between them. In contrast, the multi-label global approach uses information from all labels simultaneously and may miss more specific relationships that are relevant. To overcome these limitations and verify if improving the prediction performances of multi-label classifiers is possible, we propose using Community Detection Methods to model label correlations and partition the label space into partitions between the local and global ones. These partitions, here named hybrid partitions, are formed of disjoint clusters of correlated labels, which are then used to build multi-label datasets and train multi-label classifiers. Since our proposal can generate several hybrid partitions, we validate all of them and choose the one that is considered the best. We compared our hybrid partitions with the local and global approaches and an approach that generates random partitions. Although our proposal improved the predictive performance of the used classifier in some datasets compared with other partitions, it also showed that, in general, independent of the approach used, the classifier still has difficulties learning several labels and predicting them correctly.

CONCEPTS

Partitions Simple Version

Partitions Detailed Version

Hybrid Partitions Flow

Random Partitions Flow

RESULTS

Resulting Files

Here you can download the generated similarity matrices, label graphs, and partitions for our experiments. If you want to generate these for your experiments from the beginning, please check the source code section.

Analysis

Here you will find all the documents (tables, plots, etc.) with an analysis of the results.

INSTRUCTIONS TO REPRODUCE THE EXPERIMENTS

Environments to run experiments

Conda

You can run this experiment in a Conda Environment. The name of our conda environment is "AmbienteTeste". To be able to use this env, you must first install conda in your computer or cluster and then install the env using the following command: conda env create --file AmbienteTeste.yaml. Click here to download the files.

Singularity/AppTainer

You can also run this experiment in a singularity container. We do not provide a singularity container for this experiment, but you can build one. Here, you can find a little tutorial about how to do that for our experiments. Using singularity is better than conda environments when you have to execute all the experiments in a tmp (scratch or dev/shm) folder.

Pay attention to this because sometimes using the conda environment directly from the /home can destroy hard disks and harm all users. In some situations, copying your singularity container to the server's temp folder and running absolutely everything from there is the best solution for everyone. Talk to the administrator about this before trying to reproduce the experiments.

SOURCE CODE

Our code is completely modular because of our servers - mainly job queue, time, and memory. In this way, we can run many jobs in parallel in different methodology steps. In the future, a package that executes all the flow will be developed and available for the scientific community.

In each source code, you will find instructions about how to run the code. You can also make adjustments in the main script to save the results in your machine or your cloud using rclone (there are some examples in the R scripts).

Attention: before using rclone, check with your institution's network administrator if uploading files and folders from the cluster to the cloud is possible. In the case of using Google accounts for universities, you need to follow specific steps to configure communication between google cloud and the server.

Step 1: Pre-processing

  • Code to create the 10-Fold Cross Validation for each dataset. You can download our data splits here.

Step 2: Modeling Label Correlation

Step 3: Applying Communities Detection Methods

Steps 4, 5, and 6: Build, validate, chose, and test hybrid partitions

  • Code for Sparsification with KNN + Hierarchical Methods
  • Code for Sparsification with KNN + Non Hierarchical Methods
  • Code for Sparsification with Threshold + Hierarchical Methods
  • Code for Sparsification with Threshold + Non Hierarchical Methods

Code for Global, Local, and Random Partitions

Acknowledgment

  • This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.
  • This study was financed in part by the Conselho Nacional de Desenvolvimento Científico e Tecnológico - Brasil (CNPQ) - Process number 200371/2022-3.
  • The authors also thank the Brazilian research agencies FAPESP financial support.

Contact

elainececiliagatto@gmail.com

Links

| Site | Post-Graduate Program in Computer Science | Computer Department | Biomal | CNPQ | Ku Leuven | Embarcados | Read Prensa | Linkedin Company | Linkedin Profile | Instagram | Facebook | Twitter | Twitch | Youtube |