🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
-
Updated
May 6, 2024 - Python
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
Logical Replication extension for PostgreSQL 15, 14, 13, 12, 11, 10, 9.6, 9.5, 9.4 (Postgres), providing much faster replication than Slony, Bucardo or Londiste, as well as cross-version upgrades.
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Microsoft Program Synthesis using Examples SDK is a framework of technologies for the automatic generation of programs from input-output examples. This repo includes samples and sample data for the Microsoft Program Synthesis using Example SDK.
A block-based API for NSValueTransformer, with a growing collection of useful examples.
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️
Wrangler Transform: A DMD system for transforming Big Data
🤖 An automated machine learning framework for audio, text, image, video, or .CSV files (50+ featurizers and 15+ model trainers). Python 3.6 required.
Reference Architectures for Datalakes on AWS
A simple Spark-powered ETL framework that just works 🍺
Advanced and Fast Data Transformation in R
💄 Durable and asynchronous data imports for consuming data at scale and publishing testable SDKs.
Low-code Python library to safely use notebooks in production: schedule workflows, generate assets, trigger webhooks, send notifications, build pipelines, manage secrets (Cloud-only)
Data transformation and utility functions for R
Serialize PHP variables, including objects, in any format. Support to unserialize it too.
Examples for working with DataWeave scripts from Apex.
A visual data pipeline builder with various backends
Add a description, image, and links to the data-transformation topic page so that developers can more easily learn about it.
To associate your repository with the data-transformation topic, visit your repo's landing page and select "manage topics."