Skip to content

RapidAI/TableStructureRec

Repository files navigation

📊 Table Structure Recognition

PyPI SemVer2.0 GitHub

简体中文 | English

Introduction

This repo is an inference library used for structured recognition of tables in documents, including table structure recognition algorithm models from PaddleOCR, wired and wireless table recognition algorithm models from Alibaba Duguang, etc.

The repo has improved the pre- and post-processing of form recognition and combined with OCR to ensure that the form recognition part can be used directly.

The repo will continue to focus on the field of table recognition, integrate the latest and most useful table recognition algorithms, and strive to create the most valuable table recognition tool library.

Welcome everyone to continue to pay attention.

What is Table Structure Recognition?

Table Structure Recognition (TSR) aims to extract the logical or physical structure of table images, thereby converting unstructured table images into machine-readable formats.

Logical structure: represents the row/column relationship of cells (such as the same row, the same column) and the span information of cells.

Physical structure: includes not only the logical structure, but also the cell's bounding box, content and other information, emphasizing the physical location of the cell.

Figure from: Improving Table Structure Recognition with Visual-Alignment Sequential Coordinate Modeling

Documentation

Full documentation can be found on docs, in Chinese.

Acknowledgements

PaddleOCR Table

Cycle CenterNet

LORE

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

If you want to sponsor the project, you can directly click the Buy me a coffee image, please write a note (e.g. your github account name) to facilitate adding to the sponsorship list below.

License

This project is released under the Apache 2.0 license.