Skip to content

sepehrmhd97/Apache-Spark-Application

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Apache-Spark-Application

#Abstract This assignment focuses on Apache Spark, a powerful distributed computing framework that enables large-scale data processing with high performance. The assignment covers various aspects of Spark, including Resilient Distributed Datasets (RDDs), data partitioning, DataFrame, and Spark SQL. Additionally, it discusses best practices for improving Spark performance, such as optimizing code, increasing the number of worker nodes, and allocating memory. In addition, the assignment includes practical applications of Spark using PySpark, enabling students to write code and perform distributed computing.

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published