Skip to content

An NLP task to predict Cuisine from recipe data, visualize cuisine distribution, train MNB, XGBoost, CNN, and Random Forest models for cuisine prediction, and evaluate performance with accuracy, confusion matrix, and classification report.

Notifications You must be signed in to change notification settings

suraj5424/Cuisine-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Description of Data Management and Model Evaluation Program

The provided code imports and processes a dataset containing information about recipes and their corresponding cuisines. It then proceeds to train and evaluate several machine learning models to predict the cuisine of a recipe based on its ingredients.

Program Workflow

  1. Data Import and Preprocessing: The program starts by importing the necessary libraries and datasets from Kaggle. It then reads the training and test data into Pandas DataFrames for further processing.

  2. Data Exploration and Visualization: The program performs exploratory data analysis (EDA) on the training data, visualizing the distribution of cuisines, the number of ingredients per recipe, and the relationship between the number of ingredients and cuisines.

  3. Model Training and Evaluation:

    • Multinomial Naive Bayes (MNB): The program uses a Multinomial Naive Bayes classifier to predict the cuisine based on the ingredients. It vectorizes the ingredients using CountVectorizer, trains the model, and evaluates its performance using accuracy score, confusion matrix, and classification report.

    • XGBoost Model: Next, the program employs an XGBoost classifier to predict the cuisine. It encodes the cuisine labels, vectorizes the ingredients using CountVectorizer, trains the XGBoost model, and evaluates its performance on the validation set using accuracy score, confusion matrix, and classification report.

    • Convolutional Neural Network (CNN): The program utilizes a CNN model to predict the cuisine based on the ingredients. It tokenizes the ingredients, pads sequences to a fixed length, builds and compiles the CNN model, and trains it on the training set with early stopping. It then evaluates the model's performance on the test set using accuracy score, confusion matrix, and classification report.

    • Random Forest Model: Finally, the program employs a Random Forest classifier to predict the cuisine. It vectorizes the ingredients using TfidfVectorizer, trains the Random Forest model, and evaluates its performance on the validation set using accuracy score, confusion matrix, and classification report.

Features and Outputs

  • Data Visualization: The program visualizes the distribution of cuisines, the number of ingredients per recipe, and the relationship between the number of ingredients and cuisines through various plots and charts.

  • Model Evaluation: For each model, the program calculates and displays evaluation metrics such as accuracy score, confusion matrix, and classification report on the validation or test set.

Conclusion

Overall, the program efficiently manages the data, explores its characteristics, trains multiple machine learning models, and evaluates their performance, providing valuable insights into predicting the cuisine of recipes based on their ingredients.

About

An NLP task to predict Cuisine from recipe data, visualize cuisine distribution, train MNB, XGBoost, CNN, and Random Forest models for cuisine prediction, and evaluate performance with accuracy, confusion matrix, and classification report.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published