Skip to content

Comparison of self-supervised learning methods for Medieval Handwriting in the Latin Script dataset reveals SimCLR as the top performer, highlighting potential for historical document classification.

License

Notifications You must be signed in to change notification settings

vishnu-dev/icdar-self-supervised-learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Comparison of Self-Supervised Learning models for ICDAR CLaMM Challenge

This project presents a comparison of self-supervised learning methods for different downstream tasks in the context of Medieval Handwriting in the Latin Script dataset. Self-supervised learning has shown promise in various computer vision and natural language processing applications, but its effectiveness on historical scripts has not been extensively explored.

Three self-supervised learning methods are compared in this work.

The performance evaluation was conducted on one downstream tasks i.e. script type classification. The results indicate that the SimCLR method outperforms other methods in the downstream task for the Medieval Handwritings Script dataset. Additionally, insights were provided regarding the factors influencing the performance of self-supervised learning methods in this context, including the selection of pre-training data and the size of the pre-training dataset. In conclusion, this study showcases the potential of self-supervised learning for historical handwritten document classification tasks and emphasizes the significance of selecting suitable methods for specific downstream tasks.

Dataset

ICDAR CLaMM Challenge dataset is used for this project. The dataset can be found here

Documentation

API Documentation is available at DOCUMENTATION.md

Running the code

Prerequisites

pip install -r requirements.txt

Training

SSL Model Training

cd src/
python train.py +experiment=simclr_bolts

Linear Classifier Training

cd src/
python evaluate.py +experiment=simclr_eval

Evaluation

Linear Classifier Testing

Check notebook here

Results

Pre-training Linear evaluation
Model Name Epochs Batch size Training epochs Top-1 accuracy
SimCLR 500 256 100 71.8 %
MAE 500 256 100 36.1 %
BYOL 500 64 100 45.2 %

Image sources: ICDAR CLaMM

About

Comparison of self-supervised learning methods for Medieval Handwriting in the Latin Script dataset reveals SimCLR as the top performer, highlighting potential for historical document classification.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published