Build software better, together

Layout-Parser / layout-parser

A Unified Toolkit for Deep Learning Based Document Image Analysis

ocr computer-vision deep-learning object-detection document-image-processing layout-analysis document-layout-analysis detectron2 layout-parser layout-detection

Updated Mar 7, 2024
Python

UglyToad / PdfPig

Star

Read and extract text and other content from PDFs in C# (port of PDFBox)

pdf csharp pdfbox netstandard pdf-files pdf-document pdf-generation hocr document-analysis pdf-extractor alto-xml page-xml layout-analysis pdf-document-processor

Updated May 21, 2024
C#

An Open-Source Python3 tool for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported.

python ocr latex pytorch latex-pdf math-formula layout-analysis math-ocr mathpix table-ocr math-formula-recognition image-to-markdown

Updated May 20, 2024
Jupyter Notebook

mittagessen / kraken

Star

OCR engine for all the languages

ocr neural-networks hocr optical-character-recognition htr handwritten-text-recognition alto-xml page-xml layout-analysis

Updated May 23, 2024
Python

BobLd / DocumentLayoutAnalysis

Sponsor

Star

Document Layout Analysis resources repos for development with PdfPig.

pdf csharp hocr tei hocr-documents alto-xml table-extraction page-xml alto layout-analysis document-layout-analysis xycut docstrum pdfpig xy-cut recursive-xy-cut page-segmentation

Updated Oct 1, 2023
C#

mindspore-lab / mindocr

Star

A toolbox of OCR models, algorithms, and pipelines based on MindSpore

ocr deep-learning text-recognition text-detection layout-analysis crnn dbnet table-recognition mindspore key-information-extraction layoutxlm ocr-large-model tablemaster vary-toy

Updated May 16, 2024
Python

andreagemelli / doc2graph

Star

Doc2Graph transforms documents into graphs and exploit a GNN to solve several tasks.

nlp deep-learning pytorch layout-analysis geometric-deep-learning table-detection gnn document-understanding key-information-extraction

Updated May 23, 2023
Jupyter Notebook

NormXU / Layout2Graph

Star

An official implementation of paper "Paragraph2Graph: A Language-independent GNN-based framework for layout analysis"

layout-analysis gnn-framework

Updated Oct 14, 2023
Python

jiangnanboy / layout_analysis4j

Star

利用java-yolov8实现版面检测（Chinese layout detection），java-yolov8 is used to detect the layout of Chinese document images

java yolo layout-analysis yolov8 cdla

Updated May 5, 2023
Java

BobLd / PdfPigMLNetBlockClassifier

Sponsor

Star

Proof of concept of training a simple Region Classifier using PdfPig and ML.NET (LightGBM). The objective is to classify each text block in a pdf document page as either title, text, list, table and image.

classifier pdf machine-learning csharp lightgbm pdf-document document-layout layout-analysis pdf-document-processor document-layout-analysis ml-net pdfpig publaynet

Updated Mar 16, 2020
C#

JPLeoRX / detectron2-publaynet

Star

Trained Detectron2 object detection models for document layout analysis based on PubLayNet dataset

python machine-learning computer-vision deep-learning neural-network python3 pytorch artificial-intelligence neural-networks faster-rcnn document-classification object-detection document-analysis document-layout instance-segmentation layout-analysis document-layout-analysis detectron2 publaynet

Updated Apr 16, 2023
Python

yoshihikoueno / pdfminer-layout-scanner

Star

A more complete example of programming with PDFMiner, which continues where the default documentation stops

python pdf text-extraction pdfminer layout-analysis

Updated Jul 24, 2019
Python

dell-research-harvard / HJDataset

Star

A Large Dataset of Historical Japanese Documents with Complex Layouts

python dataset layout-analysis detectron2

Updated Jul 22, 2022
Jupyter Notebook

MBAigner / PDFSegmenter

Star

This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified and returned. Tables are retrieved formatted as a CSV.

python pdf csv table annotations cluster-analysis document-processing layout-analysis detection-model page-segmentation

Updated Sep 11, 2020
Python

VRI-UFPR / page-xml-draw

Star

A powerful CLI tool for visualization and encoding of PAGE-XML files

visualization opencv ocr segmentation image-map page-xml layout-analysis

Updated May 19, 2021
Python

MaitySubhajit / SelfDocSeg

Star

[ICDAR 2023] SelfDocSeg: A self-supervised vision-based approach towards Document Segmentation (Oral)

computer-vision layout-analysis self-supervised-learning document-segmentation

Updated Oct 6, 2023
Python

calfa-co / rasam-dataset

Star

An Open Dataset for the Recognition and Analysis of Scripts in Arabic Maghrebi (ICDAR 2021)

dataset arabic htr historical-manuscripts layout-analysis

Updated Feb 18, 2024

CaseDrive / publaynet-models

Star

Trained Detectron2 object detection models for document layout analysis based on PubLayNet dataset

python machine-learning computer-vision deep-learning neural-network python3 pytorch artificial-intelligence neural-networks faster-rcnn document-classification object-detection document-analysis document-layout instance-segmentation layout-analysis document-layout-analysis detectron2 publaynet

Updated Apr 16, 2023
Python

ppaanngggg / yolo-doclaynet

Star

YOLO models trained by DocLayNet - power your Document Intelligent by Layout Analysis

yolo document-analysis layout-analysis ultralytics yolov8 doclaynet

Updated May 23, 2024
Python

os-climate / crrf-det

Star

A web application for PDF content and table extraction, featuring image-based visual layout analysis, indexed document search, batch processing and extraction result annotation.

pdf annotation data-extraction table-extraction layout-analysis

Updated May 17, 2023
C++

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

layout-analysis

Here are 31 public repositories matching this topic...

Layout-Parser / layout-parser

UglyToad / PdfPig

breezedeus / Pix2Text

mittagessen / kraken

BobLd / DocumentLayoutAnalysis

mindspore-lab / mindocr

andreagemelli / doc2graph

NormXU / Layout2Graph

jiangnanboy / layout_analysis4j

BobLd / PdfPigMLNetBlockClassifier

JPLeoRX / detectron2-publaynet

yoshihikoueno / pdfminer-layout-scanner

dell-research-harvard / HJDataset

MBAigner / PDFSegmenter

VRI-UFPR / page-xml-draw

MaitySubhajit / SelfDocSeg

calfa-co / rasam-dataset

CaseDrive / publaynet-models

ppaanngggg / yolo-doclaynet

os-climate / crrf-det

Improve this page

Add this topic to your repo