Landscape of ML/DL evaluation metrics

The purpose of this repository is to provide a curated list of state-of-the-art works in the field of ML/DL evaluation metrics.

Generally, the metrics are using for measures of quantitative assessment which commonly used for comparing and tracking the performance of the production.

In the case of ML/DL development also, performance evaluation is an important step of the machine learning process. Evaluation metrics are used to measure the quality (include performance) of the machine learning(or deep learning) model in the performance evaluation process. Using evaluation metrics, various characteristics, and quality (and performance) factors of the ML/DL model can be quantified.

Most evaluation metrics are tied to machine learning tasks. The choice of evaluation metric completely depends on the type of ML/DL model and the implementation plan of the ML/DL model. There are different metrics for the tasks of classification, regression, ranking, clustering, topic modeling, etc. Some metrics, such as precision-recall, can be useful for multiple tasks.

Contributions and comments are always welcome. Please contact us at hollobit@etri.re.kr or send a pull request. You can have to add links through pull requests or create an issue which something I missed or need to start a discussion.

1. General

[PDF] Proben1: A set of neural network benchmark problems and benchmarking rules L Prechelt - 1994 - Citeseer (Scholar) (Semantic) (Connected)
[BUCH] Combining pattern classifiers: methods and algorithms LI Kuncheva - 2014 - books.google.com (Scholar) (Semantic) (Connected)
20 Popular Machine Learning Metrics. Part 1: Classification & Regression Evaluation Metrics
20 Popular Machine Learning Metrics. Part 2: Ranking, & Statistical Metrics
How to Choose Right Metric for Evaluating ML Model

2. Classification

(accuracy, precision, recall, F1-score, ROC, AUC, …)

ISO/IEC 4213:2022 Information technology — Artificial intelligence — Assessment of machine learning classification performance
24 Evaluation Metrics for Binary Classification (And When to Use Them)
An experimental comparison of performance measures for classification C Ferri, J Hernández-Orallo, R Modroiu Pattern Recognition Letters 30 (1), 27-38 (Scholar) (Semantic) (Connected)
[BUCH] Evaluating learning algorithms: a classification perspective N Japkowicz, M Shah - 2011 - books.google.com (Scholar) (Semantic) (Connected)
[HTML] A systematic analysis of performance measures for classification tasks M Sokolova, G Lapalme - Information processing & management, 2009 - Elsevier (Scholar) (Semantic) (Connected)
[PDF] A review on evaluation metrics for data classification evaluations M Hossin, MN Sulaiman - International Journal of Data Mining & …, 2015 - academia.edu (Scholar) (Semantic) (Connected)
Evaluation of performance measures for classifiers comparison V Labatut, H Cherifi - arXiv preprint arXiv:1112.4133, 2011 - arxiv.org - (Scholar) (Semantic) (Connected)
A survey of predictive modeling on imbalanced domains P Branco, L Torgo, RP Ribeiro - ACM Computing Surveys (CSUR), 2016 - dl.acm.org - (Scholar) (Semantic) (Connected)
Multi-label learning by exploiting label dependency ML Zhang, K Zhang - Proceedings of the 16th ACM SIGKDD …, 2010 - dl.acm.org - (Scholar) (Semantic) (Connected)
[PDF] Classifier chains for multi-label classification J Read, B Pfahringer, G Holmes, E Frank - Machine learning, 2011 - Springer (Scholar) (Semantic) (Connected)
A review on multi-label learning algorithms ML Zhang, ZH Zhou - IEEE transactions on knowledge and …, 2013 - ieeexplore.ieee.org (Scholar) (Semantic) (Connected)

3. Prediction

A comparison of MCC and CEN error measures in multi-class prediction
G Jurman, S Riccadonna, C Furlanello - PloS one, 2012 - journals.plos.org - (Scholar) (Semantic) (Connected)

4. Segmentation

ISO/IEC DIS 16466 Information Technology - 3D Printing and scanning - Assessment methods of 3D scanned data for 3D printing model
http://www.visceral.eu/resources/evaluatesegmentation-software/
[HTML] Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool AA Taha, A Hanbury - BMC medical imaging, 2015 - Springer - (Scholar) (Semantic) (Connected)
3차원 의료 영상 분할 평가 지표에 관한 고찰, 김장우, 김종효 - Review of Evaluation Metrics for 3D Medical Image Segmentation, 대한의학영상정보학회지 2017년 23권 1호 p.14 ~ 20
A review of recent evaluation methods for image segmentation YJ Zhang - Proceedings of the Sixth International Symposium …, 2001 - ieeexplore.ieee.org - (Scholar) (Semantic) (Connected)
[PDF] An overview of current evaluation methods used in medical image segmentation V Yeghiazaryan, I Voiculescu - Department of Computer Science …, 2015 - cs.ox.ac.uk - (Scholar) (Semantic) (Connected)
[HTML] Blood vessel segmentation algorithms—review of methods, datasets and evaluation metrics S Moccia, E De Momi, S El Hadji, LS Mattos - Computer methods and …, 2018 - Elsevier - (Scholar) (Semantic) (Connected)
Current methods in medical image segmentation DL Pham, C Xu, JL Prince - Annual review of biomedical …, 2000 - annualreviews.org - (Scholar) (Semantic) (Connected)
[HTML] Image segmentation evaluation: A survey of unsupervised methods H Zhang, JE Fritts, SA Goldman - computer vision and image understanding, 2008 - Elsevier - (Scholar) (Semantic) (Connected)
A benchmark for 3D mesh segmentation X Chen, A Golovinskiy, T Funkhouser - Acm transactions on graphics …, 2009 - dl.acm.org - (Scholar) (Semantic) (Connected)
A review on deep learning techniques applied to semantic segmentation A Garcia-Garcia, S Orts-Escolano, S Oprea… - arXiv preprint arXiv …, 2017 - arxiv.org - (Scholar) (Semantic) (Connected)
[HTML] Unsupervised image segmentation evaluation and refinement using a multi-scale approach B Johnson, Z Xie - ISPRS Journal of Photogrammetry and Remote …, 2011 - Elsevier - (Scholar) (Semantic) (Connected)
[HTML] A comparative evaluation of interactive segmentation algorithms K McGuinness, NE O'connor - Pattern Recognition, 2010 - Elsevier https://scholar.google.com/scholar?cites=12481616241604244476&as_sdt=2005&sciodt=0,5&hl=de (Semantic) (Connected)
Comparison and evaluation of methods for liver segmentation from CT datasets T Heimann, B Van Ginneken, MA Styner… - … on medical imaging, 2009 - ieeexplore.ieee.org - (Scholar) (Semantic) (Connected)
Medical image segmentation methods, algorithms, and applications A Norouzi, MSM Rahim, A Altameem, T Saba… - IETE Technical …, 2014 - Taylor & Francis - (Scholar) (Semantic) (Connected)

5. Deep Generative Model

(Inception score, Frechet Inception distance)

Progressive growing of gans for improved quality, stability, and variation T Karras, T Aila, S Laine, J Lehtinen - arXiv preprint arXiv:1710.10196, 2017 - arxiv.org - (Scholar) (Semantic) (Connected)
Analyzing and improving the image quality of stylegan T Karras, S Laine, M Aittala, J Hellsten… - Proceedings of the …, 2020 - openaccess.thecvf.com - (Scholar) (Semantic) (Connected)
How good is my GAN? K Shmelkov, C Schmid… - Proceedings of the …, 2018 - openaccess.thecvf.com - (Scholar) (Semantic) (Connected)
[HTML] Pros and cons of gan evaluation measures A Borji - Computer Vision and Image Understanding, 2019 - Elsevier - (Scholar) (Semantic) (Connected)
A note on the inception score S Barratt, R Sharma - arXiv preprint arXiv:1801.01973, 2018 - arxiv.org - (Scholar) (Semantic) (Connected)
An empirical study on evaluation metrics of generative adversarial networks Q Xu, G Huang, Y Yuan, C Guo, Y Sun, F Wu… - arXiv preprint arXiv …, 2018 - arxiv.org - (Scholar) (Semantic) (Connected)
Metrics for deep generative models N Chen, A Klushyn, R Kurle, X Jiang… - International …, 2018 - proceedings.mlr.press - (Scholar) (Semantic) (Connected)
Assessing generative models via precision and recall MSM Sajjadi, O Bachem, M Lucic… - Advances in Neural …, 2018 - papers.nips.cc - (Scholar) (Semantic) (Connected)
Improved precision and recall metric for assessing generative models T Kynkäänniemi, T Karras, S Laine… - Advances in Neural …, 2019 - papers.nips.cc - (Scholar) (Semantic) (Connected)

6. Detection

Evaluation Metrics for Object Detection
What makes for effective detection proposals? J Hosang, R Benenson, P Dollár… - IEEE transactions on …, 2015 - ieeexplore.ieee.org - (Scholar) (Semantic) (Connected)
A survey on performance metrics for object-detection algorithms R Padilla, SL Netto, EAB da Silva - … International Conference on …, 2020 - ieeexplore.ieee.org - (Scholar) (Semantic) (Connected)

7. Regression Metrics

(MSE, MAE)

8. Ranking Metrics

(MRR, Precision@ K, DCG & NDCG, MAP, Kendall’s tau, Spearman’s rho)

A short introduction to learning to rank H Li - IEICE TRANSACTIONS on Information and Systems, 2011 - search.ieice.org - (Scholar) (Semantic) (Connected)

9. Statistical Metrics

(Correlation)

10. Computer Vision Metrics

(PSNR, SSIM, IoU)

All image quality metrics you need in one package
A Quick Overview of Methods to Measure the Similarity Between Images
Image quality assessment: from error visibility to structural similarity Z Wang, AC Bovik, HR Sheikh… - IEEE transactions on …, 2004 - ieeexplore.ieee.org, Cited by 29229 Related articles All 42 versions - (Scholar) (Semantic) (Connected)
Image quality metrics: PSNR vs. SSIM A Hore, D Ziou - 2010 20th international conference on pattern …, 2010 - ieeexplore.ieee.org, Cited by 1438 Related articles All 12 versions -(Scholar) (Semantic) (Connected)
Seven challenges in image quality assessment: past, present, and future research DM Chandler - International Scholarly Research Notices, 2013 - hindawi.com, Cited by 398 Related articles All 8 versions -(Scholar) (Semantic) (Connected)
Full reference image quality assessment based on saliency map analysis Y Tong, H Konik, F Cheikh… - Journal of Imaging …, 2010 - ingentaconnect.com, Cited by 69 Related articles All 20 versions -(Scholar) (Semantic) (Connected)
[PDF] Metrics performance comparison for color image database N Ponomarenko, F Battisti, K Egiazarian… - … workshop on video …, 2009 - comlab.uniroma3.it, Cited by 144 Related articles All 11 versions -(Scholar) (Semantic) (Connected)
[PDF] IEM: a new image enhancement metric for contrast and sharpness measurements VL Jaya, R Gopikakumari - International Journal of Computer Applications, 2013 - Citeseer, Cited by 81 Related articles All 2 versions -(Scholar) (Semantic) (Connected)
Predicting deeper into the future of semantic segmentation P Luc, N Neverova, C Couprie… - Proceedings of the …, 2017 - openaccess.thecvf.com, Cited by 164 Related articles All 15 versions -(Scholar) (Semantic) (Connected)
A survey on deep learning techniques for image and video semantic segmentation A Garcia-Garcia, S Orts-Escolano, S Oprea… - Applied Soft …, 2018 - Elsevier, Cited by 202 Related articles All 4 versions -(Scholar) (Semantic) (Connected)
Fsrnet: End-to-end learning face super-resolution with facial priors Y Chen, Y Tai, X Liu, C Shen… - Proceedings of the IEEE …, 2018 - openaccess.thecvf.com, Cited by 184 Related articles All 13 versions -(Scholar) (Semantic) (Connected)

11. NLP Metrics

(Perplexity, BLEU score)

ISO/IEC AWI 23282 Artificial Intelligence - Evaluation methods for accurate natural language processing systems

12. Super resolution

Deep learning for image super-resolution: A survey Z Wang, J Chen, SCH Hoi - IEEE Transactions on Pattern …, 2020 - ieeexplore.ieee.org - (Scholar) (Semantic) (Connected)
Single-image super-resolution: A benchmark CY Yang, C Ma, MH Yang - European Conference on Computer Vision, 2014 - Springer - (Scholar) (Semantic) (Connected)
Image super-resolution using deep convolutional networks C Dong, CC Loy, K He, X Tang - IEEE transactions on pattern …, 2015 - ieeexplore.ieee.org - (Scholar) (Semantic) (Connected)

Appendix : Bias

[PDF] On over-fitting in model selection and subsequent selection bias in performance evaluation GC Cawley, NLC Talbot - The Journal of Machine Learning Research, 2010 - jmlr.org - (Scholar) (Semantic) (Connected)

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
LICENSE		LICENSE
README.md		README.md
wordcloud1.png		wordcloud1.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

wordcloud1.png

wordcloud1.png

Repository files navigation

Landscape of ML/DL evaluation metrics

Contents

1. General

2. Classification

3. Prediction

4. Segmentation

5. Deep Generative Model

6. Detection

7. Regression Metrics

8. Ranking Metrics

9. Statistical Metrics

10. Computer Vision Metrics

11. NLP Metrics

12. Super resolution

Appendix : Bias

About

Releases

Packages

License

hollobit/ML_evaluation_metrics

Folders and files

Latest commit

History

Repository files navigation

Landscape of ML/DL evaluation metrics

Contents

1. General

2. Classification

3. Prediction

4. Segmentation

5. Deep Generative Model

6. Detection

7. Regression Metrics

8. Ranking Metrics

9. Statistical Metrics

10. Computer Vision Metrics

11. NLP Metrics

12. Super resolution

Appendix : Bias

About

Topics

Resources

License

Stars

Watchers

Forks