Skip to content

Plotly-Dash NLP project. Document similarity measure using Latent Dirichlet Allocation, principal component analysis and finally follow with KMeans clustering. Project is completed with dynamic visual interaction.

License

kennedyCzar/NLP-PROJECT-BOOK-INSIGHTS-WITH-PLOTLY

Repository files navigation

NLP-PROJECT-BOOK-INSIGHTS-WITH-PLOTLY

forthebadge made-with-python

Project is hosted live on Heroku Hosted Live

Project implements machine learning model for Natural Language Processing (NLP). Visualization is done with Plotly Dash. Flexibility of hovering over data points to visualize book properties (meta-data) and similarity score, horizontal bar chart and book imprint. Major processing on books to extract tokenized and lemmatized features, principal component analysis for dimension reduction, and Kmeans clustering to visualize relationship among books. Project is hosted live on heroku.

PROJECT WORKFLOW

  • Import and preprocess all 148 French books
  • Stemming & Lemmatization of extracted tokens
  • Visualize most frequent words on hover. Return ordered Barplot
  • TF-IDF Model
  • Document Similarity using Cosine distance of book content
    • Principal component analysis
      K-Means clustering
  • Topic Models
    • LatentDirichletAllocation

HOW TO USE

git clone https://github.com/kennedyCzar/NLP-PROJECT-BOOK-INSIGHTS-WITH-PLOTLY

Open the script folder in your terminal and run the following command

python mplot_script.py
Navigate http://127.0.0.1:8050/ 

Image 1

Releases

No releases published

Packages

No packages published

Languages