WordLenSpotter

This is the official implementation of Paper: Word Length-aware Text Spotting: Enhancing Dense Text Detection and Recognition for Camera-captured Document Image.

Preparation

Downloaded images

The dense text spotting dataset (DSTD1500) in real reading scenarios can be downloaded here.
Sample dataset images

You can also prepare your custom dataset following the example scripts. [example scripts]
To evaluate DSTD1500, first download the zipped annotations.

Models

WordLenSpotter-MIXTRAIN [config] | model_Google Drive

Installation

Python=3.8
PyTorch=1.8.0, torchvision=0.9.0, cudatoolkit=11.1
OpenCV for visualization

Steps

Install the repository (we recommend to use Anaconda for installation.)

conda create -n WordLenSpotter python=3.8 -y
conda activate WordLenSpotter
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
pip install opencv-python
pip install scipy
pip install shapely
pip install rapidfuzz
pip install timm
pip install Polygon3
git clone https://github.com/unxiaohao/WordLenSpotter.git
cd WordLenSpotter
python setup.py build develop

dataset path

datasets
|_ dstd1500
|  |_ train_images
|  |_ test_images
|  |_ dstd1500_test.json
|  |_ dstd1500_train.json
|  |_ weak_voc_new.txt
|  |_ weak_voc_pair_list.txt
|_ evaluation
|  |_ test_gt.zip

Usage

Training

Pretrain WordLenSpotter

python projects/WordLenSpotter/train_net.py \
  --num-gpus 8 \
  --config-file projects/WordLenSpotter/configs/WordLenSpotter-pretrain.yaml

Joint training model on the mixed real dataset

python projects/WordLenSpotter/train_net.py \
  --num-gpus 8 \
  --config-file projects/WordLenSpotter/configs/WordLenSpotter-mixtrain.yaml

Fine-tune

Fine-tune model

python projects/WordLenSpotter/train_net.py \
  --num-gpus 8 \
  --config-file projects/WordLenSpotter/configs/WordLenSpotter-WordLenSpotter-finetune-dstd1500.yaml

Visualize

Visualize the detection and recognition results

python demo/demo.py \
  --config-file projects/WordLenSpotter/configs/WordLenSpotter-finetune-dstd1500.yaml \
  --input input1.jpg \
  --output ./output \
  --confidence-threshold 0.4 \
  --opts MODEL.WEIGHTS ./output/FINETUNE20K/model_final.pth

The visualization results are shown in the figure:

Acknowlegement

This project is based on Adelaidet, Detectron2 and SwinTextSpotter.

Citation

If our paper helps your research, please cite it in your publications:

@article{wang2023word,
  title={Word length-aware text spotting: Enhancing detection and recognition in dense text image},
  author={Wang, Hao and Zhou, Huabing and Zhang, Yanduo and Lu, Tao and Ma, Jiayi},
  journal={arXiv preprint arXiv:2312.15690},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
configs		configs
demo		demo
detectron2		detectron2
projects/WordLenSpotter		projects/WordLenSpotter
tests		tests
tools		tools
LICENSE		LICENSE
README.md		README.md
eng_cls_dict.txt		eng_cls_dict.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

configs

configs

demo

demo

detectron2

detectron2

projects/WordLenSpotter

projects/WordLenSpotter

tests

tests

tools

tools

LICENSE

LICENSE

README.md

README.md

eng_cls_dict.txt

eng_cls_dict.txt

setup.cfg

setup.cfg

setup.py

setup.py

Repository files navigation

WordLenSpotter

Preparation

Models

Installation

Steps

Usage

Training

Fine-tune

Visualize

Acknowlegement

Citation

About

Languages

License

unxiaohao/WordLenSpotter

Folders and files

Latest commit

History

Repository files navigation

WordLenSpotter

Preparation

Models

Installation

Steps

Usage

Training

Fine-tune

Visualize

Acknowlegement

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages