Skip to content

Tim-Abwao/named-entity-extractor

Repository files navigation

Named Entity Extractor

Predict the named entities present in a file using spaCy. spaCy is a powerful, user-friendly, open-source Natural Language Processing library in Python.

Text to be processed is extracted from documents using textract. The results (named entities and some context) are then saved in an excel file.

screencast

Getting Started

  1. Download the files, and set up a virtual environment:

    git clone https://github.com/Tim-Abwao/named-entity-extractor.git
    cd named-entity-extractor
    python3 -m venv venv
    source venv/bin/activate
  2. Install the required packages:

    pip install -U pip
    pip install openpyxl pandas spacy textract
    python -m spacy download en_core_web_md
  3. Start the app:

    python -m entity_extractor

A tkinter GUI (demonstrated above) should pop up to help navigate to, and select a document to process.

NOTE: For help with tkinter - related issues, please see TkDocs.

About

Extract named entities from data in files of various formats.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages