Name		Name	Last commit message	Last commit date
parent directory ..
python		python
scala		scala
README.md		README.md
word_count_with_mapreduce.png		word_count_with_mapreduce.png

README.md

Word Count

"... This book will be a great resource for
both readers looking to implement existing
algorithms in a scalable fashion and readers
who are developing new, custom algorithms
using Spark. ..."

Dr. Matei Zaharia
Original Creator of Apache Spark

FOREWORD by Dr. Matei Zaharia

Introduction to Word Count

Word Count is a simple and easy to understand algorithm which can be easily implemented as a MapReduce/Spark application. Given a set of text documents, the program counts the number of occurrences of each word.
Word count finds out the frequency of each word in a set of documents/files. The goal is to create a dictionary of (key, value) pairs, where key is a word (as a String), and value is an Integer denoting the frequency of a given key/word.
Complete set of solutions are given for Word Count problem using
BEFORE reduction filter: You may add filter() to remove undesired words (this can be done after tokenizing records)
AFTER reduction filter: To have a desired final word count as (word, frequency), you may add filter() to remove elements where frequency < N , where N (as an integer) is your threshold. This can be done after reduction.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wordcount

wordcount

python

python

scala

scala

README.md

README.md

word_count_with_mapreduce.png

word_count_with_mapreduce.png

README.md

Word Count

Introduction to Word Count

Word Count in MapReduce

Word Count in PySpark RDDs

Word Count in PySpark DataFrames

References

Files

wordcount

Directory actions

More options

Directory actions

More options

Latest commit

History

wordcount

Folders and files

parent directory

Word Count

Introduction to Word Count

Word Count in MapReduce

References