Skip to content

mahmoudparsian/pyspark-tutorial

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PySpark Tutorial

  • PySpark is the Python API for Spark.

  • The purpose of PySpark tutorial is to provide basic distributed algorithms using PySpark.

  • PySpark supports two types of Data Abstractions:

    • RDDs
    • DataFrames
  • PySpark Interactive Mode: has an interactive shell ($SPARK_HOME/bin/pyspark) for basic testing and debugging and is not supposed to be used for production environment.

  • PySpark Batch Mode: you may use $SPARK_HOME/bin/spark-submit command for running PySpark programs (may be used for testing and production environemtns)




PySpark Examples and Tutorials


Books


Miscellaneous


PySpark Tutorial and References...


Questions/Comments

Thank you!

best regards,
Mahmoud Parsian

Data Algorithms with Spark Data Algorithms with Spark PySpark Algorithms Data Algorithms