Skip to content

NLP on 3M reviews & Image classification on 200k images

License

Notifications You must be signed in to change notification settings

naveenrc/YelpChallenge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Yelp Dataset challenge

This project is about Natural Language processing and classifying images obtained from yelp dataset which can be downloaded from here https://www.yelp.com/dataset.

Requirements

NLP

  • python 3.5
  • spacy
  • gensim
  • pyLDAvis
  • Word2vec
  • Bokeh
  • tSNE

Image classification

  • python 3.5
  • tensorflow
  • opencv
  • 12 GB RAM

Setup

NLP

Modern_NLP.ipynb walks through the following topics(best viewed on nb viewer)

  1. A tour of the dataset
  2. Introduction to text processing with spaCy
  3. Automatic phrase modeling
  4. Topic modeling with LDA
  5. Visualizing topic models with pyLDAvis
  6. Word vector models with word2vec
  7. Visualizing word2vec with t-SNE

Setup for image classification

  1. Install project requirements
  2. Create a folder yelpData and move the extracted data from yelp into this folder. Move the photos to 'yelpData/yelpPhotos' directory
  3. Run photo_process.py, enter the size you desire to resize to Ex: 64 for 64 x 64 or 32 for 32 x 32.
  4. Run photo_info.py to get information about the photos
  5. Run classifier.py to start the model (may take longer, 6 to 10 hours without a GPU)
  6. Run predict.py to predict image label
pip install -r requirements.txt
python ./photoAnalysis/photo_process.py
python ./photoAnalysis/photo_info.py
python ./classifier/classifier.py
python ./classifier/predict.py

Report for image classification can be found here

Releases

No releases published

Packages

No packages published