Skip to content

shiv4nsh/spark-LDA-example

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

spark-LDA-example

A simple Spark LDA example. This project contains a basic Document Clustering example in which data cleaning is also done.

We are going to perform these procedures for the document clustering, these steps include:

  1. Spark RegexTokenizer : For Tokenization

  2. Stanford NLP Morphology : For Stemming and lemmatization

  3. Spark StopWordsRemover : For removing stop words and punctuation

  4. Spark TF-IDF : For computing term frequencies or tf-idf

  5. Spark LDA : For Clustering of documents.

About

A simple Spark LDA example. to demonstrate a full fletched clustering algorithm, with data cleaning using the processess like lemmatization , stemming etc.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages