Skip to content

sdman135/Anime_Score_Predictor---Project-3

Repository files navigation

Flatiron Project-3 --- Anime Score Predictor

Project3-Anime Database

For my third project in Flatiron I wanted to use a dataset I really was interested in. I chose to do my project on anime. Specifically data on anime from myAnimelist.net. I used anime data, collected from myAnimeList.net containing various data on the shows (ie. genre, duration, release date, etc.). I had to drop more then half of the dataset because it was mainly hentai and I'm not interested in that for this project. Maybe later... but seriously from 14,478 to 5,561 unique anime data points and a few entries that were not usable.

What Did I Do?

  • Imported .cvs file

  • Cleaned .csv file and removed all hentai.

  • Created Dummies variables, using LabelEncoder wrapped in a custom function, for non-numerical data (for columns : Type, Source, Rating) so we can work with these categorical data.

  • Visualized all data against each other to see if we can find any solid correlations

  • Found correlations between each category.

  • I then made a few regression models to predict the scores (Multiple Linear, Ridge, Lasso, Elastic Net, Gradient Boosting and Pipeline with Polynomial & Gradient Boosting Regressions) while using GridSearch to find optimal hyperparameters.
  • My best preforming model has an Test score accuracy of ~93.26% (Polyinomal and Gradient Boost Regression Pipeline):
  • Predicting Score of an Anime (Not in dataset) - Manually inputted Anime info

Anime with release date 2018 (release year supported in the dataset)

Anime with release date pass 2019 (last release year supported in the dataset)

Anime with release date 2020 (release year not supported in the dataset)

Built With

  • Python3.8
  • Jupyter Notebook 6.0.0
  • A few imports: pandas, numpy, matplotlib.pyplot, seaborn, statsmodels and sklearn

Authors

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgments