DocClean

Clean your document images with crumpled backgrounds, strains and folds with Deep Neural Networks.

Here is a demo:

You can find the slide deck accompanying this project here.

Installation

For installing docclean is easy. Just run the following:

git clone https://github.com/devanshkv/insight_docclean.git
cd insight_docclean
pip install -r requirements.txt
python3 setup.py install

Documentation

Have a look at our beauitful docs here.

Training the models

The models can be trained using train.py. The usage is as follows:

usage: train.py [-h] -t {cycle_gan,autoencoder} -k KAGGLE_DATA_DIR
                     [-c CLEAN_BOOKS_DIR]
                     [-d DIRTY_BOOKS_DIR] [-e EPOCHS]
                     [-b BATCH_SIZE] [-v]

Quick reference table

Short	Long	Default	Description
`-h`	`--help`		show this help message and exit
`-t`	`--type`	`None`	Which model to train
`-k`	`--kaggle_data_dir`	`None`	Kaggle Data Directory
`-c`	`--clean_books_dir`	`None`	Directory containing clean images
`-d`	`--dirty_books_dir`	`None`	Directory containing dirty images
`-e`	`--epochs`	`100`	Number of epochs to train for
`-b`	`--batch_size`	`16`	Batch size
`-v`	`--verbose`		Be verbose

Running the inference

Using the trained model the infence can be run using infer.py. The usage is as follows:

usage: infer.py [-h] [-v] [-g GPU_ID] -c DATA_DIR [-b BATCH_SIZE] -t
               {cycle_gan,autoencoder} -w WEIGHTS

Quick reference table

Short	Long	Default	Description
`-h`	`--help`		show this help message and exit
`-v`	`--verbose`		Be verbose
`-g`	`--gpu_id`	`0`	GPU ID (use -1 for CPU)
`-c`	`--data_dir`	`None`	Directory with candidate pngs.
`-b`	`--batch_size`	`32`	Batch size for training data
`-t`	`--type`	`None`	Which model to train
`-w`	`--weights`	`None`	Model weights

Running the streamlit app

Run,

streamlit run app.py

and the use localhost:8501 to view the app.

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
.github		.github
data		data
docclean		docclean
docs		docs
tests		tests
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
infer.py		infer.py
packages.txt		packages.txt
requirements.txt		requirements.txt
setup.py		setup.py
train.py		train.py

License

devanshkv/insight_docclean

Folders and files

Latest commit

History

Repository files navigation

DocClean

Installation

Documentation

Training the models

Quick reference table

Running the inference

Quick reference table

Running the streamlit app

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages