#

apache-spark

Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

Here are 1,660 public repositories matching this topic...

furkancets / PrescreiberPipelineSpark

Trying best case apache spark working environment for robust data pipelines

spark apache-spark hadoop pyspark

Updated Apr 1, 2023
Python

basel-ay / Hands-on-Apache-Spark

Writing dummy snippets of code to read, manipulate, and build a simple ML model with PySpark.

apache-spark linear-regression pyspark

Updated Jul 18, 2023
Jupyter Notebook

jgrove90 / ufo-deltalake

🛸 This project showcases an Extract, Load, Transform (ELT) pipeline built with Python, Apache Spark, Delta Lake, and Docker. The objective of the project is to scrape UFO sighting data from NUFORC and process it through the Medallion architecture to create a star schema in the Gold layer that is ready for analysis.

python docker apache-spark data-engineering data-pipeline delta-lake elt-pipeline

Updated Jul 3, 2023
Python

AdemirCastro / databricks-spark-sql_challenge1

Distributed processing challenge

python sql big-data spark apache-spark pyspark data-engineering databricks

Updated Feb 18, 2023
HTML

gpcodervn / learn-apache-spark

Learn Apache Spark Java

java kafka apache-spark kafka-streams

Updated Apr 25, 2023
Java

iOS00 / Apache-Spark-on-Azure-Databricks-Practice

Parctice with Spark on Azure Databricks

python sql spark apache-spark databricks

Updated Mar 15, 2023
Jupyter Notebook

realvineeths / Solar-Power-generation-pipeline

Real-time analysis pipeline

mysql python big-data apache-spark apache-kafka

Updated May 10, 2023
Python

suranimayur / SparkScalaCourse

ScalaSpark Course for beginner

scala spark apache-spark

Updated Apr 10, 2023
Scala

EilinLux / XmlPySparkParser

parser for XML files using pyspark

google composer big-data spark apache-spark pipeline xml gcp pyspark xml-parser airflow-dags

Updated Jan 28, 2023

sajithrw / spark

Spark with Java

apache-spark cheatsheet spark-sql

Updated May 31, 2023
Java

krmbzds / learning-spark

My Spark playground!

apache-spark bigdata pyspark cluster-computing

Updated Feb 25, 2016

bsachin207 / CloudComputing

A forecasting project based on Apache-Spark and implemented with Naive Bayes theorem.

apache-spark naive-bayes-algorithm

Updated Apr 11, 2017
Python

zichicc / spark-refcost2016

Data Analysis exploiting Spark 2.2.0 Datasets

scala apache-spark data-analysis

Updated Jul 31, 2017
Scala

venkatreddyamalla / spark

Mirror of Apache Spark

spark apache-spark

Updated Feb 3, 2017
Scala

jayantak / spark-data-gen

testing spark apache-spark test-data test-data-generator

Updated May 26, 2017
Scala

Enth / zeppelin-notebooks

Gallery of Apache Zeppelin notebooks using Enth-Spark-AI.

data-science data data-mining big-data apache-spark apache open-data data-analysis ckan socrata zeppelin-notebook enth enth-spark-ai

Updated Jun 15, 2017

michalzeman / laPlatform

BigData concept for Lambda Architecture design pattern, Apache Kafka -> for streaming/data pipeline integration, Apache Spark -> for distributed data processing, RESTful, services -> Akka HTTP

streaming scala kafka akka-http apache-spark akka-streams apache-kafka lambda-architecture

Updated Mar 21, 2021
Scala

ruivieira / scala-base-notebook

An image for running Scala Jupyter notebooks and Apache Spark in the cloud on OpenShift

docker scala apache-spark jupyter openshift jupyter-notebook

Updated Sep 6, 2017
Makefile

logitravel / zeppelin-docker

Apache Zeppelin Docker container

docker apache-spark apache-zeppelin

Updated Jul 14, 2017

greyamigo / spark-plug

Let the Sparks FLY

scala apache-spark

Updated Jun 2, 2017
Scala

Created by Matei Zaharia

Released May 26, 2014

Followers: 414 followers
Repository: apache/spark
Website: spark.apache.org
Wikipedia: Wikipedia

Related Topics