Spark

Introductioni

That you for stopping by my Spark project. From the research that I have done so far, Apache Spark is a suitable computing engine and library suite for parallel data processing on computer clusters. In the repo, I coded some basics of Spark using Python. The repo contains codes for Spark DataFrame, working with Operators in Spark and working with missing values. It is not an exhaustive list; this was my getting started working on the tool.

Description

To work with Spark on the local machine, you must install some packages and create a local variable enabling Spark to run on the local machine. To get Spark to work using the notebook on this repo, you need to download some and create local variables. Below are the instructions.

Requirements for Spark setup in a windows machine
JDK
Python
Hadoop winutiles
Spark Binaries
Environmental Variables
Python IDE (VS Code or Jupyter Notebook)

Contributors

This repo is created for learning purpose and if you have any interesting of being a contributor or you want to give idea of how to make things better, please let me know

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
ContainsNull.csv		ContainsNull.csv
DataFrame_Basic_Operations.ipynb		DataFrame_Basic_Operations.ipynb
GroupBy and Aggregate Functions.ipynb		GroupBy and Aggregate Functions.ipynb
README.md		README.md
Spark Basic DataFrame.ipynb		Spark Basic DataFrame.ipynb
Spark Operation Part 1.ipynb		Spark Operation Part 1.ipynb
Working with Missing values in Spark.ipynb		Working with Missing values in Spark.ipynb
appl_stock.csv		appl_stock.csv
cars.csv		cars.csv
people.json		people.json
sales_info.csv		sales_info.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ContainsNull.csv

ContainsNull.csv

DataFrame_Basic_Operations.ipynb

DataFrame_Basic_Operations.ipynb

GroupBy and Aggregate Functions.ipynb

GroupBy and Aggregate Functions.ipynb

README.md

README.md

Spark Basic DataFrame.ipynb

Spark Basic DataFrame.ipynb

Spark Operation Part 1.ipynb

Spark Operation Part 1.ipynb

Working with Missing values in Spark.ipynb

Working with Missing values in Spark.ipynb

appl_stock.csv

appl_stock.csv

cars.csv

cars.csv

people.json

people.json

sales_info.csv

sales_info.csv

Repository files navigation

Spark

Introductioni

Description

Contributors

About

Releases

Packages

Languages

JonathanPollyn/Spark

Folders and files

Latest commit

History

Repository files navigation

Spark

Introductioni

Description

Contributors

About

Topics

Resources

Stars

Watchers

Forks

Languages