document-analysis

Here are 72 public repositories matching this topic...

UglyToad / PdfPig

Read and extract text and other content from PDFs in C# (port of PDFBox)

pdf csharp pdfbox netstandard pdf-files pdf-document pdf-generation hocr document-analysis pdf-extractor alto-xml page-xml layout-analysis pdf-document-processor

Updated May 9, 2024
C#

tstanislawek / awesome-document-understanding

Star

A curated list of resources for Document Understanding (DU) topic

Updated Jun 2, 2023

AlibabaResearch / AdvancedLiterateMachinery

Star

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

ocr computer-vision artificial-intelligence text-recognition document text-detection document-analysis end-to-end-ocr multimodal scene-text-recognition multimodal-deep-learning scene-text-detection vision-language document-understanding scene-text-detection-recognition document-recognition document-intelligence documentai vision-language-transformer vision-language-model

Updated Apr 23, 2024
C++

Yuliang-Liu / Curve-Text-Detector

Star

This repository provides train＆test code, dataset, det.&rec. annotation, evaluation script, annotation tool, and ranking.

deep-learning object-detection document-analysis scene-text

Updated Jul 20, 2020
Jupyter Notebook

wenwenyu / PICK-pytorch

Star

Code for the paper "PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks" (ICPR 2020)

document-analysis graph-convolutional-network graph-learning graph-neural-networks document-understanding key-information-extraction

Updated May 3, 2024
Python

jpWang / LiLT

Star

Official PyTorch implementation of LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding (ACL 2022)

nlp information-extraction document-analysis document-understanding multilingual-models document-ai multimodal-pre-trained-model

Updated Oct 31, 2022
Python

pandora-analysis / pandora

Star

Pandora is an analysis framework to discover if a file is suspicious and conveniently show the results

infosec document-analysis malware-detection document-analyzing

Updated May 9, 2024
Python

CybercentreCanada / assemblyline

Star

AssemblyLine 4: File triage and malware analysis

framework incident-response malware python3 cybersecurity cert infosec malware-analyzer malware-analysis malware-research automation-framework cyber-security file-analysis document-analysis security-automation security-tools malware-detection assemblyline security-automation-framework

Updated May 9, 2024
Python

masyagin1998 / robin

Star

RObust document image BINarization

python opencv ocr computer-vision deep-learning keras neural-networks document-analysis u-net document-binarization

Updated Jul 27, 2022
Python

chriswolfvision / local_adaptive_binarization

Star

Local adaptive image binarization

computer-vision document-analysis document-binarization

Updated Mar 5, 2023
C++

anisha2102 / docvqa

Star

Document Visual Question Answering

computer-vision deep-learning document-analysis visual-question-answering

Updated Jul 30, 2020
Python

aws-samples / amazon-textract-transformer-pipeline

Star

Post-process Amazon Textract results with Hugging Face transformer models for document understanding

ocr document-analysis amazon-textract huggingface-transformers

Updated May 7, 2024
Python

monniert / docExtractor

Star

(ICFHR 2020 oral) Code for "docExtractor: An off-the-shelf historical document element extraction" paper

pytorch segmentation historical-data document-analysis

Updated May 25, 2023
Python

mirabdullahyaser / Retrieval-Augmented-Generation-Engine-with-LangChain-and-Streamlit

Star

Powerful web application that combines Streamlit, LangChain, and Pinecone to simplify document analysis. Powered by OpenAI's GPT-3, RAG enables dynamic, interactive document conversations, making it ideal for efficient document retrieval and summarization.

natural-language-processing artificial-intelligence question-answering chat-application document-analysis streamlit gpt-3 large-language-models generative-ai langchain openai-chatgpt retrieval-augmented-generation

Updated Feb 22, 2024
Python

Xyntopia / pydoxtools

Star

Effortlessly extract information from unstructured data with this library, utilizing advanced AI techniques. Compose AI in customizable pipelines and diverse sources for your projects.

python nlp pdf information-retrieval extraction document-analysis document-extraction llm chatgpt

Updated Feb 15, 2024
Python

ZeningLin / ViBERTgrid-PyTorch

Star

An unofficial PyTorch implementation of "Lin et al. ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Information Extraction from Documents. ICDAR, 2021"

information-extraction document-analysis key-information-extraction document-ai visual-information-extraction