Imbalanced Data

Battling the unblananced dataset problem using different data augmentation methods

The network models in the project use the area under the ROC curve (AUC)[1] as a metric for assessing prediction performance. Overall accuracy is not a suitable metric as it does not give a good overview of seperation power for unbalanced datasets[2]. AUC on the other hand uses recall and precision, meaning it takes advantage of the confusion matrix[2] of the model and will thus give a more suitable measurement for models working on imbalanced datasets.

Model Selection (Cross Validation) using AutoKeras[5] and some popular network models - Best performer: LENET 300

Random Undersampling
Oversampling through standard duplication
Oversampling through duplication with small noise
Oversampling using SMOTE [3]
Oversampling using mixup [4]

References

Andrew P. Bradley - 'The Use of the Area Under the ROC Curve in The Evaluation of Machine Learning Algorithms' - https://linkinghub.elsevier.com/retrieve/pii/S0031320396001422
Sofia Visa, Ramsay Brian, Ralescu Anca - 'Confusion Matrix-based Feature Selection' - http://ceur-ws.org/Vol-710/paper37.pdf
Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, W. Philip Kegelmeyer - 'SMOTE: Synthentic Minority Over-sampling Technique' - https://arxiv.org/pdf/1106.1813.pdf
Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, David Lopez-Paz - 'mixup: Beyond Empirical Rsik Minimization' - http://arxiv.org/abs/1710.09412
Jin, Haifeng and Song, Qingquan and Hu, Xia - Auto-Keras: An Efficient Neural Architecture Search System - https://dl.acm.org/doi/10.1145/3292500.3330648

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
src		src
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

.DS_Store

.DS_Store

.gitattributes

.gitattributes

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Imbalanced Data

About

Releases

Packages

Languages

License

Taoudi/ImbalancedData

Folders and files

Latest commit

History

Repository files navigation

Imbalanced Data

About

Resources

License

Stars

Watchers

Forks

Languages