Skip to content

oroszgy/hungarian-text-mining-workshop

Repository files navigation

Text mining workshop

Preparation for the workshop

Please be prepared with

  • basic knowledge of Python
  • experience in using Jupyter notebooks

During the course we will use little bit of Pandas (10 minute intro) and scikit-learn to build simple machine learning models.

Install dependencies and run the notebooks

The easy way: using Docker

Get the docker image: docker pull oroszgy/hungarian-text-mining-workshop

Start Jupyter Notebook: make start

The hard way: installing the packages manually

  1. Make sure you have Python 3.5+ installed (preferably a conda distribution)
  2. Clone this repository: git clone http://github.com/oroszgy/hungarian-text-mining-workshop && cd hungarian-text-mining-workshop
  3. Install the necessary packages: pip install -r requirements.txt
  4. Download the Enlgish and the Hungaruan NLP models for spaCy:
    • python -m spacy download en
    • pip install https://github.com/oroszgy/spacy-hungarian-models/releases/download/hu_tagger_web_md-0.1.0/hu_tagger_web_md-0.1.0.tar.gz
  5. Install HuNlpy
    • pip install https://github.com/oroszgy/hunlp/releases/download/0.2/hunlp-0.2.0.tar.gz

Start Jupyter Notebook: jupyter notebook

Table of Contents

  1. Practical NLP in Python: spaCy and textacy, Describing documents with words
  2. Document categorization, Sentiment analysis
  3. Extracting named entities and concepts

Softwares used


(c) Gyorgy Orosz, 2017