Skip to content

This repository contains Apache Spark programs implemented in Python. These programs are part of my learning process for Apache Spark and are intended to serve as examples for anyone who is also learning or working with Apache Spark.

License

Notifications You must be signed in to change notification settings

thenameisajay/Apache-Spark-Programs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Apache Spark Programs

DESCRIPTION

This repository contains Apache Spark programs implemented in Python. These programs are part of my learning process for Apache Spark and are intended to serve as examples for anyone who is also learning or working with Apache Spark.


Installation

Before running these programs, you need to install Apache Spark and PySpark on your system. You can follow the instructions on the official Apache Spark website to download and install the latest version of Apache Spark: https://spark.apache.org/downloads.html

Once you have installed Apache Spark, you can install PySpark using pip:

pip install pyspark


Usage

To run any of the programs in this repository, navigate to the program's directory and run the following command:

spark-submit program-name.py Make sure to replace program-name with the name of the program you want to run.


PROGRAMS :

Here is a list of all the programs in this repository:

  1. Total Spent By customer (sorted and SparkSQL version)
  2. Calculate Average Friends By Age
  3. Filtering RDD's and finding Minimum Temperature
  4. Movie Ratings Counter
  5. Word Count using FlatMap
  6. Calculating Min and Max Temperature using DataFrames
  7. Social Graph Analysis using Marvel Superheroes
  8. Calculating Average Friends By Age using SparkSQL
  9. Calculating Total Spent By Customer using DataFrames
  10. Word Count using SparkSQL
  11. Calculating Average Friends By Age using DataFrames

CONTRIBUTIONS

If you have any suggestions or ideas for new Apache Spark programs, feel free to open an issue or submit a pull request.


LICENSE

This repository is licensed under the MIT License. See the LICENSE file for more information.

About

This repository contains Apache Spark programs implemented in Python. These programs are part of my learning process for Apache Spark and are intended to serve as examples for anyone who is also learning or working with Apache Spark.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published