Skip to content

fer-aguirre/pdf-2-ner

Repository files navigation

PDF 2 NER

Web application to convert scanned PDF files to text-based data and apply Named Entity Recognition (NER) to extract entities in Spanish

Created by: Fer Aguirre

Directory Structure

├── app.py
├── assets
│   └── pdfs
├── config.ini
├── config.ini.secret
├── data
│   ├── processed
│   └── raw
├── docs
│   ├── data-dictionary.md
│   ├── explore-data.md
│   ├── references
│   └── reports
├── LICENSE
├── notebooks
│   ├── 0.0-testing-nlp-models.ipynb
│   ├── 1.0-scraping-data.ipynb
│   └── 2.0-analyzing-data.ipynb
├── outputs
│   ├── figures
│   └── tables
├── pdf_2_ner
│   ├── data
│   ├── __init__.py
│   └── utils
├── Pipfile
├── Pipfile.lock
├── README.md
└── setup.py

License

This project is released under MIT License.

About

Web application for information extraction and named entity recognition for PDF files (work-in-progress).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published