#

Apache Spark

Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

Here are 8,253 public repositories matching this topic...

tobymao / sqlglot

Python SQL Parser and Transpiler

Updated May 12, 2024
Python

longpt233 / my-images

collection of image docker

docker elasticsearch airflow kafka spark mongodb hadoop docker-compose bigdata aerospike

Updated May 12, 2024
Shell

apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.

big-data spark flink real-time-analytics data-ingestion table-store paimon streaming-datalake

Updated May 12, 2024
Java

apache / spark

Apache Spark - A unified analytics engine for large-scale data processing

python java r scala sql big-data spark jdbc

Updated May 12, 2024
Scala

nessie

projectnessie / nessie

Nessie: Transactional Catalog for Data Lakes with Git-like semantics

git java data spark aws-lambda iceberg

Updated May 12, 2024
Java

LB-Yu / data-systems-learning

Learning summary and examples about data systems.

distributed-systems big-data spark hbase flink

Updated May 12, 2024
Java

japila-books / spark-sql-internals

The Internals of Spark SQL

spark apache-spark book internals spark-sql mkdocs-material

Updated May 12, 2024

apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.

bigquery real-time sql database spark hive hadoop etl snowflake olap query-engine redshift dbt elt iceberg hudi delta-lake lakehouse

Updated May 12, 2024
Java

hogimn / observatory

Extracting observatory temperature data from CSV files and generating tile images using Mercator projection for visualization

java spark mercator-projection

Updated May 12, 2024
Java

Tiago-B-C-Reis / Apache_Spark

Spark with Python, including Spark Streaming, Machine Learning, Spark DataFrames and more.

machine-learning spark apache-spark pyspark

Updated May 12, 2024
Jupyter Notebook

hussein-awala / spark-on-k8s

A Python package to submit and manage Apache Spark applications on Kubernetes.

python kubernetes airflow spark

Updated May 12, 2024
Python

risingwave

risingwavelabs / risingwave

SQL stream processing, analytics, and management. We decouple storage and compute to offer speedy bootstrapping, dynamic scaling, time-travel queries, and efficient joins.

Updated May 12, 2024
Rust

alvertogit / bigdata_docker

Big Data Docker Data Science Spark Spark3 Hadoop HDFS Scala Python Artificial Intelligence Machine Learning Jupyter Lab Notebook

python docker data-science machine-learning scala big-data spark jupyter-notebook jupyter-lab spark3

Updated May 12, 2024
Python

mauropelucchi / unibg_mobile_and_cloud_2024

University of Bergamo - Mobile & Cloud (Computer Engineering) 2023/2024

python aws mobile spark flutter

Updated May 12, 2024
C++

xuwenyihust / DataPulse

Platform for Big Data & AI

kubernetes spark jupyter-notebook gcp mlflow delta-lake

Updated May 12, 2024
Shell

iimeta / fastapi-admin

智元 Fast API 是一站式API管理系统，将各类大模型API进行统一格式、统一规范、统一管理，使其在功能、性能和用户体验上达到极致。

api fast spark openai glm gpt fastapi gpt-4 chatgpt ernie-bot qwen

Updated May 12, 2024
Go

iimeta / fastapi-web

智元 Fast API 是一站式API管理系统，将各类大模型API进行统一格式、统一规范、统一管理，使其在功能、性能和用户体验上达到极致。

api fast spark openai glm gpt fastapi gpt-4 chatgpt ernie-bot qwen

Updated May 12, 2024
Vue

masalinas / doc-spark-minikube

DoC Spark on minikube from Mac with Docker Desktop

kubernetes spark python3 minio spark-sql spark-operator

Updated May 12, 2024
Shell

ytsaurus / ytsaurus

YTsaurus is a scalable and fault-tolerant open-source big data platform.

sql big-data spark clickhouse distributed-database lakehouse olap-database ytsaurus

Updated May 12, 2024
C++

starlake-ai / starlake

Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.

bigquery scala spark etl snowflake hdfs redshift synapse

Updated May 12, 2024
Scala

Created by Matei Zaharia

Released May 26, 2014

Followers: 414 followers
Repository: apache/spark
Website: spark.apache.org
Wikipedia: Wikipedia

Related Topics