Apache Spark
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Here are 8,253 public repositories matching this topic...
collection of image docker
-
Updated
May 12, 2024 - Shell
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
-
Updated
May 12, 2024 - Java
Learning summary and examples about data systems.
-
Updated
May 12, 2024 - Java
The Internals of Spark SQL
-
Updated
May 12, 2024
Extracting observatory temperature data from CSV files and generating tile images using Mercator projection for visualization
-
Updated
May 12, 2024 - Java
Spark with Python, including Spark Streaming, Machine Learning, Spark DataFrames and more.
-
Updated
May 12, 2024 - Jupyter Notebook
A Python package to submit and manage Apache Spark applications on Kubernetes.
-
Updated
May 12, 2024 - Python
SQL stream processing, analytics, and management. We decouple storage and compute to offer speedy bootstrapping, dynamic scaling, time-travel queries, and efficient joins.
-
Updated
May 12, 2024 - Rust
Big Data Docker Data Science Spark Spark3 Hadoop HDFS Scala Python Artificial Intelligence Machine Learning Jupyter Lab Notebook
-
Updated
May 12, 2024 - Python
Platform for Big Data & AI
-
Updated
May 12, 2024 - Shell
DoC Spark on minikube from Mac with Docker Desktop
-
Updated
May 12, 2024 - Shell
YTsaurus is a scalable and fault-tolerant open-source big data platform.
-
Updated
May 12, 2024 - C++
Created by Matei Zaharia
Released May 26, 2014
- Followers
- 414 followers
- Repository
- apache/spark
- Website
- spark.apache.org
- Wikipedia
- Wikipedia