Skip to content
#

Apache Spark

spark logo

Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

Here are 8,304 public repositories matching this topic...

Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines

  • Updated Jun 1, 2024
  • Python

This construct builds some elements for you to quickly launch an EMR Serverless application. After submitting the Emr Serverless job, you could also launch an EMR notebook via cluster template to check the outcome from the EMR Serverless application.

  • Updated Jun 1, 2024
  • TypeScript

Created by Matei Zaharia

Released May 26, 2014

Followers
417 followers
Repository
apache/spark
Website
spark.apache.org
Wikipedia
Wikipedia

Related Topics

hadoop scala