Skip to content

gabrielecola/Imbalance_classification_problem

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 

Repository files navigation

The Imbalance Problem in Classification

Final dissertation for the BSc in Statistics at Università degli Studi di Napoli Federico II.

Abstract

The document proposes several approaches to deal with Class Imbalance. Firstly it analyze the pros and cons of the Pre-processing methods, that they can be divided in:
Undersampling methods
Oversampling methods
Hybrid methods

Secondly it analyze the Cost sensitive solutions, that are characterized by modifying existing algorithm (i.e Decision Tree, SVM, Ensemble Methods) in order to change the weights of each class.

Finally, there is an application of all these methods to two dataset. The first dataset is about Churn and they are adopted all the pre-processing methods and tested with a SVM classifier, while the second one is about Spam and instead they are adopted all cost sensitive methods.