Skip to content

PastorGL/datacooker-dist

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Apache Spark based Dist utility to supplement Data Cooker ETL tool and replace 'dist-cp'/'s3-dist-cp' to copy data from external storage to Spark cluster and back.

Doesn't require strict schema nor data catalog, but can select 'columns' from underlying delimited text. Also can copy Parquet and plain text files, and can be extended to adapt other storages (as an example, there are bare-bone JDBC adapters included).

Historically it was used in a GIS project, so is well tested and proved by years in production.

Your contributions are welcome. Please see the source code in the original GitHub repo at https://github.com/PastorGL/datacooker-dist