Skip to content
This repository has been archived by the owner on Jan 19, 2020. It is now read-only.

A collection tools/scripts to explore the ListenBrainz data using Apache Spark.

License

Notifications You must be signed in to change notification settings

metabrainz/listenbrainz-labs

Note: This repository is archived and merged into listenbrainz-server. Please open all pull requests in the listenbrainz-server codebase.

Things to do in order for them to run correctly:

Set env var:

export PYSPARK_PYTHON=which python3

Install required modules:

pip3 install -r requirements.txt

Install java and scala:

apt-get install default-jdk scala

Install spark (download 2.3.0 tgz for hadoop and unzip in /usr/local/spark

To run the scripts:

spark-submit --master spark://195.201.112.36:7077 --executor-memory=29g pwd/<script>

spark-submit --master spark://195.201.112.36:7077 --executor-memory=29g pwd/train_models.py df models