Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

Lesson 1: Introduction to the Spark Environment

In this lesson, you learn how to get setup with Spark and see the basics of how to program with Spark.

I start with a little bit of history of the project and provide motivation for the framework: why Spark, why now?

From there, I walk through the process of getting Spark setup locally on your laptop so you can start developing your our Spark applications!

And all along the way you learn the common paradigms and abstractions Spark leverages, mainly functional programming and resilient distributed datasets.

Objectives

  • Understand the history and motivation behind Spark
  • Setup a local Spark environment
  • Program your first Spark job with the PySpark shell
  • Understand the common paradigms for programming with Spark: RDDs and functional programming
  • Work with key-value pairs to perform MapReduce operations

Examples

References

1.1: Getting the Materials

1.2: A Brief Historical Diversion

1.3: Origins of the Framework

1.4: Why Spark?

1.5: Getting Set Up: Spark and Java

1.6: Getting Set Up: Scientific Python

1.7: Getting Set Up: R Kernel for Jupyter

1.8: Your First PySpark Job

1.9: Introduction to RDDs: Functions, Transformations, and Actions

1.10: MapReduce with Spark: Programming with Key-Value Pairs