pyspark
Here are 3,371 public repositories matching this topic...
An open source, standard data file format for graph data storage and retrieval.
-
Updated
May 22, 2024 - C++
the portable Python dataframe library
-
Updated
May 21, 2024 - Python
Simple and Distributed Machine Learning
-
Updated
May 21, 2024 - Scala
💜🌈📊 A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Apache Superset, Dbt 🌺
-
Updated
May 21, 2024 - Jupyter Notebook
Example of local pyspark setup including DeltaLake for unit-testing
-
Updated
May 21, 2024 - Python
Hopsworks - Data-Intensive AI platform with a Feature Store
-
Updated
May 21, 2024 - Java
PySpark script to aggregate small parquet files in a prefix into larger files. Designed to be run on AWS Glue
-
Updated
May 21, 2024 - Python
🌈📊📈 The Zillow Home Value Prediction project employs linear regression models on Kaggle datasets to forecast house prices. 📉💰Using Apache Spark (PySpark) within a Docker setup enables efficient data preprocessing, exploration, analysis, visualization, and model building with distributed computing for parallel computation.
-
Updated
May 21, 2024 - Jupyter Notebook
Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications and the underlying data engines.
-
Updated
May 21, 2024 - Java
Large dataSet of IPL Data till 2017 analysis using PySpark.
-
Updated
May 21, 2024 - Jupyter Notebook
Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines
-
Updated
May 21, 2024 - Python
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
-
Updated
May 21, 2024 - HTML
Apache Spark Connector for Azure Cosmos DB
-
Updated
May 20, 2024 - Scala
This repo contains Big Data Project, its about "Real Time Twitter Sentiment Analysis via Kafka, Spark Streaming, MongoDB and Django Dashboard".
-
Updated
May 20, 2024 - Jupyter Notebook
State of the Art Natural Language Processing
-
Updated
May 22, 2024 - Scala
Improve this page
Add a description, image, and links to the pyspark topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the pyspark topic, visit your repo's landing page and select "manage topics."