Embedding Contexts into Recipe Ingredients

Read my article for greater explanation: https://towardsdatascience.com/embedding-contexts-into-recipe-ingredients-709a95841914

Tree-based methods and ANNs have successfully been applied to predict the type of cuisine using a list of ingredients. Converting the ingredient list to a simple bag-of-words matrix, which is essentially a one-hot-encoded matrix, gives a prediction of accuracy of 78% on the Yummly recipes dataset.

Can we use Word Embeddings to improve those results?

In this work, we use Gensim's Word2Vec implementation to convert a given list of ingredients to a fixed-length vector representation. Let's see how those vectors look:

A cleaned recipe list and context vector representation gives classification accuracy of only 65%, which is lower than the baseline 78%. Let's look in detail at some cuisines and their top ingredients, shall we?

So, since the predition accuracy fell from 78% to 65%, is Word-Embedding bad?

Probably. Probably Not. There are caveats. For starters, the dataset has imbalanced classes. Possibly, a more thorough (or maybe less thorough?) cleaning of data is needed. Maybe the vectors built by Gensim need more tuning.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
images		images
ContextVector.ipynb		ContextVector.ipynb
Doc2Vec.ipynb		Doc2Vec.ipynb
MOHAN-NISHANT-19300436-CS7IS4-FINAL-GROUP9-AMBASSADOR.pdf		MOHAN-NISHANT-19300436-CS7IS4-FINAL-GROUP9-AMBASSADOR.pdf
Minutes of Meeting Group 9.pdf		Minutes of Meeting Group 9.pdf
README.md		README.md
Text Analytics report source.zip		Text Analytics report source.zip
Text_Analytics (2).pdf		Text_Analytics (2).pdf
baseline.ipynb		baseline.ipynb
cv-visualisation-v1.ipynb		cv-visualisation-v1.ipynb
cv-visualisation.ipynb		cv-visualisation.ipynb
final_cookbook.ipynb		final_cookbook.ipynb
final_cookbook_ingredients_analysis.ipynb		final_cookbook_ingredients_analysis.ipynb
final_cookbook_ingredients_analysis_v2.ipynb		final_cookbook_ingredients_analysis_v2.ipynb
train.json		train.json

mohannishant6/Recipe-Ingredients-as-Word-Embeddings

Folders and files

Latest commit

History

Repository files navigation

Embedding Contexts into Recipe Ingredients

Can we use Word Embeddings to improve those results?

So, since the predition accuracy fell from 78% to 65%, is Word-Embedding bad?

Anyway, getting poor results is also good research, right?

About

Topics

Resources

Stars

Watchers

Forks

Languages