text-extraction

This repository contains code for a simple application to detect text from images using Pythonracter Recognition (OCR), and Streamlit for creating a user-friendly web application. The application allows users to upload images or capture them via camera input and extracts text present

computer-vision text-extraction tesseract-ocr streamlit-webapp text-extraction-from-image

Updated Jun 3, 2024
Python

mhadeli / Python-Text-Extraction

Star

Extract text from multiple PDFs via Python script.

pdf text-extraction

Updated May 31, 2024
Python

zanachka / extruct

Star

Extract embedded metadata from HTML markup

text-extraction html-extraction

Updated May 30, 2024
Python

unidoc / unipdf

Star

Golang PDF library for creating and processing PDF files (pure go)

golang pdf signing text-extraction pdf-generator pdf-generation pdf-reader pdf-manipulation pdf-library pdf-document-processor pdf-compression pdf-sign pdf-reports

Updated May 30, 2024
Go

vaites / php-apache-tika

Star

Apache Tika bindings for PHP: extract text and metadata from documents, images and other formats

ocr php-library tika apache text-extraction text-recognition

Updated May 28, 2024
PHP

dotfurther / OpenDiscoverPlatformCaseStudy

Star

Case study using dotfurther's Open Discover Platform with the RavenDB document store to rapidly create a full-text search/eDiscovery/information governance capable demonstration application.

metadata text-extraction full-text full-text-search ravendb ediscovery indexing-engine file-format-detection data-breach file-deduplication pii information-governance-catalog personally-identifiable-information archive-extractor pii-detection file-identification full-text-extraction document-ingestion information-governance

Updated May 28, 2024

nguyen-tho / ID-card-extract-module

Star

deep-learning text-extraction id-card transformer-ocr

Updated May 25, 2024
Python

abhinaba-ghosh / any-text

Star

Get text content from any file

text text-extraction reader file-reader text-extractor

Updated May 18, 2024
JavaScript

MRGRD56 / textractor-translator

Star

Translate visual novels in real time

electron javascript games translator typescript translation anime text-extraction visual-novel textractor textractor-extension

Updated May 17, 2024
TypeScript

miso-belica / sumy

Sponsor

Star

Module for automatic summarization of text documents and HTML pages.

python nlp pagerank-algorithm text-extraction reduction summarization html-page summary lsa sumy textteaser summarizer html-extraction html-extractor

Updated May 16, 2024
Python

yasminsarkhosh / machine-learning-bsc-thesis-2024

Star

This GitHub repository hosts the notebooks and tools developed as part of this thesis to automate the extraction, processing, and analysis of data from the MICCAI 2023 conference, aiding in the systematic review and providing a structured foundation for further research in this crucial area.

data-science machine-learning data-visualization text-extraction artificial-intelligence healthcare medical-imaging data-analysis datasets annotation-framework data-quality demographic-analysis medical-image-processing miccai pdf-data-extraction medical-ai healthcare-ai miccai2023 medical-ai-project

Updated May 15, 2024
Jupyter Notebook

TYPO3-Solr / ext-tika

Star

A TYPO3 CMS extension that provides Apache Tika functionality

search php metadata cms cms-extension tika language-detection typo3 typo3-cms-extension file-indexing text-extraction

Updated May 16, 2024
PHP

edhou20 / Medical-Texts-NLP-Clustering-

Star

nlp clustering text-extraction dimensionality-reduction vectorization unsupervised-learning

Updated May 13, 2024
Python

Improve this page

Add a description, image, and links to the text-extraction topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the text-extraction topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

text-extraction

Here are 212 public repositories matching this topic...

adbar / trafilatura

ICIJ / datashare

nainiayoub / pdf-text-data-extractor

flairNLP / fundus

ssciwr / AMMICO

tomMEM / RAG_with_LM-studio

dwatteau / scummtr

kanchan2803 / ImgToText

mhadeli / Python-Text-Extraction

zanachka / extruct

unidoc / unipdf

vaites / php-apache-tika

dotfurther / OpenDiscoverPlatformCaseStudy

nguyen-tho / ID-card-extract-module

abhinaba-ghosh / any-text

MRGRD56 / textractor-translator

miso-belica / sumy

yasminsarkhosh / machine-learning-bsc-thesis-2024

TYPO3-Solr / ext-tika

edhou20 / Medical-Texts-NLP-Clustering-

Improve this page

Add this topic to your repo