This is a Python 2 project about Natural Language Processing and specially about geolocation features extraction from a corpus. We use the known NLTK and some other python modules. We used an annotated corpus as input. Corpus has the same structure with the corpus in this project.
This scripts extract general features for Natural Language Processing. For example calculates chars per document or symbols per document etc. For more info check the script.
This script extract geolocation features for english documents. Author's nationality is necessary.
This script extract both general and geolocation features.
- NLTK
- Dataset with correct annotations.
run as root:
pip install -r requirements.txt
Simakis Panagiotis (Initial Work)
This project is licensed under the GNU General Public License version 3 - see the LICENSE file for details