Skip to content

d-one/NLPeasy-workshop

Repository files navigation

NLPeasy Workshop

Build NLP pipelines the easy way

Repository for the NLPeasy workshop.

For the workshop you have 2 possibilities to participate.

Mybinder.org

This is a pure online version so no installation needed on your laptop other than a browser. There might be issues to connect behind company firewalls though.

Downsides:

  • your session will be closed after a period of inactivity of 15-60 minutes, e.g. when you loose your internet connection
  • your work in the Jupyter-Notebooks will be lost then.

https://mybinder.org/v2/gh/d-one/NLPeasy-workshop/master Binder

However, there is only 1-2 GB of RAM available, which is tough for our example. Also if you loose your connection or close your laptop for 10 minutes your session is lost. During the workshop we will provide you with bigger VMs in our cloud.

The same can also be done if you have docker installed (see instructions below) using:

JUPYTER_PORT=8888
docker run --rm -itp $JUPYTER_PORT:$JUPYTER_PORT doneai/nlpeasy-workshop jupyter lab --ip=\* --port=$JUPYTER_PORT --NotebookApp.token='' --NotebookApp.password=''

And then going to http://localhost:8888.

Own Laptop

You can work on your own laptop or server. For this you need:

Setup

Get this repository:

git clone https://github.com/d-one/nlpeasy-workshop
cd nlpeasy-workshop

Then setup a virtual environment, install requirements and download a spaCy-model: Then on the terminal issue:

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python -m spacy download en_core_web_md

Also you might want to download some bigger files:

curl -LO https://github.com/rudeboybert/JSE_OkCupid/raw/master/profiles.csv.zip
curl -LO https://github.com/d-one/NLPeasy-workshop/releases/download/v0.2/okc_enriched_demo.pickle.tar.gz
curl -LO https://github.com/d-one/NLPeasy-workshop/releases/download/v0.2/elastic-data.tar.gz

This will download:

  • profiles.csv.zip: our data for today
  • okc_enriched_demo.pickle.tar.gz: the solution that should come out of NLPeasy (saving you a an hour of computation)
  • elastic-data.tar.gz: if elasticsearch is pointed to the elastic-data/elastic-data as it's data folder, then you can see immediatly the indexed data and generated dashboard.

Start Jupyter Lab

Still in the activated virtual env venv you now can start jupyter lab

jupyter lab

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages