Skip to content

Examples of and differences between various Spark APIs

Notifications You must be signed in to change notification settings

anoopdixith/RDD-DF-DS-SSQL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RDD-DF-DS-SSQL

TL;DR: Examples of and differences between various Spark APIs

The complete, runnable code, with output is available here: http://goo.gl/EdrCUo

(https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/6374798392727515/2002375612871426/4076179716382534/latest.html)

Details:

I realized that most people who join our company and who’re new to the Spark ecosystem are overwhelmed by the different set of APIs that it offers! Most of their questions that needed some human answering or that waited on StackOverflow for an answer were related to porting one API call to another, differences between them, using the most optimized approach, how to use them etc.

Made this sample project to explain most of it.

In the fictional town of Irvin, there are all kinds of people - couples, singles, folks in long distance relationships, gay couples, open relationships, poly-marriages and in its biggest employer Notox, there’s widespread nepotism and gender imbalance!

Here’s an audit using

  1. RDD APIs
  2. DataFrame APIs
  3. Dataset APIs
  4. Spark SQL

Screenshot

About

Examples of and differences between various Spark APIs

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages