Skip to content

rezazad68/BCDUnet_DIBCO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This GitHub page contains my implementation for DIBCO challenges in Document Image Binarization. The implemented code uses the BCDU-Net model for learning the binarization process on the DIBCO series. The evaluation results show the BCDU-net chan achieves the best performance on DIBCO challenges. If this code helps with your research please consider citing the following paper:

R. Azad, et. all, "Bi-Directional ConvLSTM U-Net with Densely Connected Convolutions ", ICCV, 2019, download link.

Updates

  • December 3, 2019: First release (Complete implementation for DIBCO Series, years 2009 to 2017 datasets added.). Other datasets can be added easily.

Prerequisties and Run

This code has been implemented in python language using Keras library with TensorFlow backend and tested in ubuntu OS, though should be compatible with the related environment. following Environment and Library needed to run the code:

  • Python 3
  • Keras - tensorflow backend

Run Demo

For training deep model for each DIBCO year, follow the bellow steps:

DIBCO Series

1- Download the DIBCO datasets from this link and extract it. We included DIBCO datasets from 2009 to 2017. It is easy to add DIBCO 2018, 2019 or other datasets, just need to revise the utils code.
2- Run Prepare_DIBCO.py for data preparation and dividing data to train and test sets. Please note that this code will consider whole the samples of one particular year as a test set and the rest of the years for the training set. It is the common data division which uses in the DIBCO challenge.
3- Run Train_DIBCO.py for training BCDU-Net model using trainng and validation (20% of the training samples) sets. The model will be training for 100 epochs and it will save the best weights for the validation set.
4- For performance calculation and producing binarization result, run Evaluate.py. It will represent performance measures and will save related figures and results in output folder.

Notice

We train the model using patches that we extract from the training set. Also for test image binarization, we apply patch-based overlaping binarization. If you want to train and evaluate the model of any particular year just determine the test year (parameter Test_year = 2016) when runing Prepare_DIBCO.py and Evaluate.py.

Quick Overview

Structure of the Bidirection Convolutional LSTM that used in BCDU-Net network

Diagram of the proposed method

Results

For evaluating the performance of the BCDU-Net model on the DIBCO series, we followed the setting used in SAE. In [1] the authors provided the experimental results on only DIBCO series 2014 and 2016. To do so, they have considered the whole samples of one particular year (for example 2014 or 2016) as a test set and rest of the samples from other years as a train set. We used the same setting for reporting our results. In bellow, the results of the proposed approach illustrated in terms of F-measure.

Methods DIBCO 2014 DIBCO 2016
Otsu 91.56 73.79
Niblack 22.26 16.7
Sauvola 77.08 82.00
Wolf et al. 90.47 81.76
Gatos et al. 91.97 74.97
Sauvola MS 87.86 65.04
Su et al. 95.14 90.27
Howe 90.00 80.64
Kliger and Tal 95.00 90.48
CNN 81.23 54.58
SAE 89.12 85.27
R. Azad BCDU-Net 97.88 98.96

Document Image Binarization Results on DIBCO Series

Documnet Image Binarization result 1 Documnet Image Binarization result 2 Documnet Image Binarization result 3 Documnet Image Binarization result 4

Model weights

You can download the learned weights for DIBCO series, which trained on DIBCO 2009-2017 (except 2016) samples. Please note that the trained model can be used for other years too.

Test year Learned weights
DIBCO 2016 Model Weights

Query

All implementation done by Reza Azad. For any query please contact us for more information.

rezazad68@gmail.com