Repository containing Docker images for create a cluster Spark on Hadoop Yarn.
-
Updated
Sep 10, 2023 - Dockerfile
Repository containing Docker images for create a cluster Spark on Hadoop Yarn.
KMeans, Cure and Canpoy algorithms are demonstrated using Pyspark.
An end-to-end data pipeline for building Data Lake and supporting report using Apache Spark.
A spark cluster based on docker-compose.
Script to run and find similarities between movies from a movie lens data set using Python & Spark Clustering.
This is my contribution in the project Diastema
I'll walk you through launching a cluster manually using Spark standalone deploy mode, as well as connecting an app to the cluster, launching the app, where to view the monitoring and logging.
A spark cluster containing multiple spark masters based on docker-compose.
Steps to deploy a local spark cluster w/ Docker. Bonus: a ready-to-use notebook for model prediction on Pyspark using spark.ml Pipeline() on a well known dataset
A distributed application to identify top 50 taxi pickup locations in New York by analyzing over 1 billion records using apache spark, hadoop file system and scala.
Terraform module to create managed, full-spectrum, open-source analytics service Azure HDInsight. This module creates Apache Hadoop, Apache Spark, Apache HBase, Interactive Query (Apache Hive LLAP) and Apache Kafka clusters.
Start clusters in virtualbox VMs
Spark standalone architecture, local architecture and reading hadoop file formats i.e. avro, parquet and ORC
Spark submit extension from bde2020/spark-submit for Scala with SBT
In this project, we used both Hadoop / MapReduce and Spark to do distributed computing. The first task was to perform a series of operations using a Mapper and Reduce java file that was implemented on a Hadoop server. The second task was to perform similar operations, but on Spark instead.
In this study, we propose to use a distributed storage and computation system in order to track money transfers instantly. In particular, we keep our transaction history in a distributed file system as a graph data structure. We try to detect illegal activities by using Graph Neural Networks (GNN) in distributed manner.
docker spark standalone
Add a description, image, and links to the spark-cluster topic page so that developers can more easily learn about it.
To associate your repository with the spark-cluster topic, visit your repo's landing page and select "manage topics."