Instructions how to run this project (run this from the root of the repository):
- Check the default configuration paths in
src/main/resources/application.conf
and overwrite if necessary:
input-path = "./input/*.arff" // input path for arff files
output-path = "./output" // output path for parquet files
- Prepare a jar with dependencies (you'll need sbt for this):
sbt assembly
- Run the job to run the ingestion
java -cp target/scala-2.12/spark-arff-source-assembly-0.1.jar handson.Ingestion