EEA Corpus (alpha stage)

This docker image is based on spaCy, Textacy, pyLDAvis & others to analyse the EEA Corpus (the collection of all published EEA documents) or any other CSV file with a column of text.

It provides a number of Machine Learning and Natural Language Processing algorthims that can be run on top of the EEA Corpus or a subset of it.

~~The idea is to provide these methods over a REST API when possible.~~

Current features

Compose a text transformation pipeline to prepare a corpus

First upload a CSV file, then use the "Create a corpus" button to enter the pipeline composition page.

Create and visualise topic models via pyLDAvis.

The topics are found via a text-mining technique called Topic Modeling.

In machine learning and natural language processing, a topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents.

Video demonstration

How to run:

docker-compose build
docker-compose up -d

This will (after some time) start the EEA Corpus application server on localhost:8181

EEA Corpus Data

The latest EEA Corpus dataset can be produced by visiting global catalogue > See all results > download csv.

Once the csv file is downloaded, you can pass it to this application to be analysed. Make sure your first column is the "document text" to be analysed. The other columns are considered metadata.

You may download an already generated large EEA corpus data for testing like this:

curl -L -o data.csv https://www.dropbox.com/s/sihmoc4wwpl0kr2/data_all.csv?dl=1

Name		Name	Last commit message	Last commit date
Latest commit History 176 Commits
corpus		corpus
examples/ldavis		examples/ldavis
src/eea.corpus		src/eea.corpus
.coveragerc		.coveragerc
.dockerignore		.dockerignore
.eslintrc.js		.eslintrc.js
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
devel-compose.yml		devel-compose.yml
docker-compose.yml		docker-compose.yml
enter_shell.sh		enter_shell.sh
ldavis.png		ldavis.png
rancher-compose.yml		rancher-compose.yml
start.sh		start.sh
termite.png		termite.png

License

eea/eea.corpus

Folders and files

Latest commit

History

Repository files navigation

EEA Corpus (alpha stage)

Current features

Compose a text transformation pipeline to prepare a corpus

Create and visualise topic models via pyLDAvis.

How to run:

EEA Corpus Data

About

Topics

Resources

License

Stars

Watchers

Forks

Languages