Scala/Spark Library for Server Logs Analytics

Build the artifact

sbt package

Choose/modify/create a config file

The config file contains where the log files are kept and where the parquet folder (structured file) should be written

Example for STRING. In this example we use /scratch/local but we could use /scratch/cluster if we wanted to run in the cluster

name: STRING

#Directory where to find the log files
logDirectory: /scratch/local/weekly/dteixeir/string-logs/*

#Directory where to output or read the parquet file
parquetFile: /scratch/local/weekly/dteixeir/string-parquet/

Run the application

./ configs/oma-config.yaml

Choose the appropriated option (option 2 and 3, requires option 1 parquet)

Using config file configs/oma-config.yaml
1) Convert Parquet
2) Insights Report
3) Distinct IPs
4) Quit

Parquet (Required)

Option 1, Convert Parequet is required to proceed further. This converstion will convert the "raw log files" to a structured / indexed format for fast analysis.

Insights report

This option will generate a report to be included in Insights

Distinct IPs

This will produce a file with all distinct IPs

script insights - genereate

spark-shell $SPARK_SCRIPT_MEMORY -i scripts/analyse.scala

DRAFT - Optional: Run the analysis on a cluster

(The documentation below is not ready)

$SPARK_HOME/bin/spark-submit --class org.elixir.insights.server.logs.ServerLogAnalyser --master local[4] target/scala-2.11/server-log-analytics_2.11-1.0.jar

32 cores 100GB of memory

./ -c 32 -m 100G spark://

Took 2.5 hours

spark-shell --executor-memory 100G --master spark:// -i analyse.scala


