A subproject of Predictiveworks that provides common access to Cassandra, Elasticsearch, HBase, MongoDB, Parquet, JDBC database and other data sources from Apache Spark.
-
Updated
Feb 23, 2015 - Scala
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
A subproject of Predictiveworks that provides common access to Cassandra, Elasticsearch, HBase, MongoDB, Parquet, JDBC database and other data sources from Apache Spark.
Pyspark Notebook With Docker
code snippets to write Apache Spark applications using Java
Apache Spark Programs to perform data analysis on movielens data
Simple demonstration of how to build a complex real time machine learning visualization tool.
Large Scale Data Engineering assignment - VU MSc AI
Apache Spark Awesome List
geneSpark is a bioinformatics software program written in Python and Apache Spark for big data epigenetic histone modification ChIP-seq analysis.
Simple document classifier using Apache Spark
An Apache Spark standalone application using the Spark API in Scala. The application uses Simple Build Tool(SBT) for building the project.
Investigating the trade-offs of low latency responses over quality when applying machine learning algorithms over lambda architecture.
Using Spark to get some knowledge from Dota 2 match data
This is an example of real time stream processing using Spark Streaming, Kafka & Elasticsearch.
Apache Spark Basics - Java Examples
Connect to SQL Server using Apache Spark
Created by Matei Zaharia
Released May 26, 2014