Skip to content

mahmoudparsian/data-algorithms-with-spark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Algorithms with Spark by Mahmoud Parsian

"... This book will be a great resource for
both readers looking to implement existing
algorithms in a scalable fashion and readers
who are developing new, custom algorithms
using Spark. ..."

Dr. Matei Zaharia
Original Creator of Apache Spark

FOREWORD by Dr. Matei Zaharia

Foreword by Dr. Matei Zaharia (Original Creator of Apache Spark)




Software:

All programs are tested with the following software:

Spark Python Scala Java
Apache Spark 3.4.0 Python 3.10.5 Scala 2.13 Java 11

Table of Contents

Chapter Title
Glossary Glossary of Big Data, MapReduce, Spark
Chapter 1 Introduction to Data Algorithms
Chapter 2 Transformations in Action
Chapter 3 Mapper Transformations
Chapter 4 Reductions in Spark
Chapter 5 Partitioning Data
Chapter 6 Graph Algorithms
Chapter 7 Interacting with External Data Sources
Chapter 8 Ranking Algorithms
Chapter 9 Fundamental Data Design Patterns
Chapter 10 Common Data Design Patterns
Chapter 11 Join Design Patterns
Chapter 12 Feature Engineering in PySpark

Bonus Chapters

Bonus Chapter Title / Description
Glossary Glossary of Big Data, MapReduce, Spark
Word Count Solutions for Word Count using RDDs and DataFrames
Anagrams Find words, which are anagrams
Lambda Expressions Using Lambda Expressions in PySpark programs
TF-IDF Term Frequency - Inverse Document Frequency
K-mers K-mers for DNA Sequences
Correlation All vs. All Correlation
Mapping Partitions mapPartitions() Complete Example
UDF User-Defined Function Examples
DataFrames Transformations Examples on Creation and Transformation of DataFrames
DataFrames Tutorials DataFrames Tutorials: from collections and CSV text files
Join Operations Examples on join of RDDs and DataFrames
PySpark Tutorial 101 Examples on using PySpark RDDs and DataFrames
Physical Data Partitioning Tutorial of Physical Data Partitioning
Monoids and Combiners Monoid as a Design Principle

Data Algorithms with Spark Data Algorithms with Spark Data Algorithms with Spark