#

document-processing

Here are 37 public repositories matching this topic...

dhlab-epfl / dhSegment

Generic framework for historical document processing

tensorflow python3 segmentation historical-data document-processing

Updated Jul 9, 2021
Python

formkiq-core

formkiq / formkiq-core

A full-featured Document Layer for your application, providing the functionality of a flexible document management system, including storage, discovery, processing, and retrieval. Deploys directly into your Amazon Web Services Cloud. 🌟 Star to support our work!

aws ocr serverless headless cloud-storage document-database amazon-web-services dms document-management optical-character-recognition document-processing document-management-system document-api document-apis intelligent-document-processing document-layer

Updated Jun 10, 2024
Java

awslabs / project-lakechain

⚡ Cloud-native, AI-powered, document processing pipelines on AWS.

aws machine-learning natural-language-processing computer-vision serverless hacktoberfest document-processing aws-cdk generative-ai retrieval-augmented-generation

Updated Jun 10, 2024
TypeScript

steindani / pandoc-include

An include filter for Pandoc

markdown pandoc pandoc-filter document-processing

Updated Dec 6, 2020
Haskell

cburschka / lyx

Unofficial mirror of git://git.lyx.org/lyx.git (updates daily. not affiliated with lyx.org.)

latex mirror lyx document-processing

Updated Mar 21, 2023
C++

afrozas / proceedings

Semantic extraction from conference proceedings.

semantic conferences spacy document-processing

Updated Jul 26, 2020
Python

kili-technology / awesome-datasets

A comprehensive list of annotated training datasets classified by use case.

Updated Jul 8, 2022

rhubarb

awslabs / rhubarb

A Python framework for multi-modal document understanding with Amazon Bedrock

multi-modal document-processing generative-ai intelligent-document-processing amazon-bedrock

Updated Jun 6, 2024
Python

parsee-ai / parsee-core

Retrieval of fully structured data made easy. Use LLMs or custom models. Specialized on PDFs and HTML files. Extensive support of tabular data extraction and multimodal queries.

structured-data document-processing multimodal llm

Updated May 21, 2024
Python

aws-solutions / enhanced-document-understanding-on-aws

Enhanced Document Understanding on AWS delivers an easy-to-use web application that ingests and analyzes documents, extracts content, identifies and redacts sensitive customer information, and creates search indexes from the analyzed data.

document-analysis document-processing

Updated May 30, 2024
JavaScript

MBAigner / PDFSegmenter

This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified and returned. Tables are retrieved formatted as a CSV.

python pdf csv table annotations cluster-analysis document-processing layout-analysis detection-model page-segmentation

Updated Sep 11, 2020
Python

greed2411 / tokyo

tokyo, a REST API, when given any type of document 📄, Identifies mime-type 🧐. Suggests extension 🦔. Alas Extracts text 💪.

clojure extension filetype text-extraction ring mime-types text-parser extract-text apache-tika document-processing text-parsing

Updated Jun 13, 2020
Clojure

eklem / stopword-trainer

A module for creating stopword lists for any language, based on a set of documents.

nlp information-retrieval stopwords document-processing stopwords-removal

Updated Nov 13, 2023
JavaScript

jmanhype / DSPy-Multi-Document-Agents

An advanced distributed knowledge fabric for intelligent document processing, featuring multi-document agents, optimized query handling, and semantic understanding.

nlp distributed-systems ai query-optimization knowledge-management document-processing vector-search

Updated Apr 23, 2024
Python

jeanbaptisteb / doccleaner

A Python command-line utility intended for automating some copyediting tasks in documents. It allows editing zipped, XML-based files (e.g. docx, odt, or epub), through XSLT stylesheets. Can be rather easily extended with your own custom xsl stylesheets.

docx text-processing odt document-processing xsl-transformation xsl-stylesheet xsl-sheet

Updated Jul 17, 2018
XSLT

m4nd0mb3 / document-templater

Document Templater is a powerful tool for automated document generation. Streamline the process of creating standard documents, such as contracts, reports, and forms, using predefined templates. This repository contains the source code for Document Templater, allowing you to easily integrate this functionality into your projects and automate docs.

api automation integration backend forms templates swagger expressjs reports contracts pdf-generation word-documents swagger-api expressjs-api document-generation expressjs-server document-processing swagger3

Updated Sep 19, 2023
JavaScript

RPetitpierre / Generic_Semantic_Segmentation_of_Historical_Maps

computer-vision historical-maps document-processing

Updated Jan 17, 2022
Jupyter Notebook

abdur75648 / urdu-text-detection

Text line detection for Urdu OCR (UTRNet)

ocr text-detection document-processing urdu-text-detection urdu-ocr utrnet contournet

Updated Jan 31, 2024
Python

CentralFloridaAttorney / zmongo_retriever

Use data from MongoDB in LangChain, Llama and OpenAI

python mongo machine-learning database mongodb openai data-retrieval document-processing langchain llamacpp data-chunking

Updated Mar 31, 2024
Python

SvenEichelsheimer / filegazer

FileGazer - deep file analysing and categorisation

ocr tika tesseract content-extraction document-processing file-analysing document-categorisation

Updated Nov 20, 2022

Improve this page

Add a description, image, and links to the document-processing topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the document-processing topic, visit your repo's landing page and select "manage topics."