Skip to content

hougs/spark-dataflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

spark-dataflow

Spark-dataflow allows users to execute dataflow pipelines with Spark. Executing a pipeline on a spark cluster is easy: Depend on spark-dataflow in your project and execute your pipeline in a program by calling SparkPipelineRunner.run.

The Maven coordinates of the current version of this project are: com.cloudera.dataflow.spark dataflow-spark 0.0.1

An example of running a pipeline against a spark cluster in local mode with 2 threads. Pipeline p = Pipeline.create(PipelineOptionsFactory.create()); /** logic for building your pipeline */ EvaluationResult result = new SparkPipelineRunner("local[2]").run()

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages