Pokémon Data Analysis

Data

Taken from the Pokémon Challenge on Kaggle

Pre-Analysis

Began by sprucing up the data via renaming as well as mutating some new variables, then ran some visualizations before creating neural net.
Take away: Of the categoricals, Legendary appears to be closely linked with strength, whereas the relationship between strength and type or generation appears to be less significant.

Tools

R — Easy to perform complex operations statistical operations concisely
- RStudio — IDE, for ease of coding
- library(tidyverse) — conglomeration of packages for visualizing and cleaning data
- library(neuralnet) — for building the neural nets
- library(dplyr) — for data manipulation i.e. pulling apart and concisely scaling data (included in Tidyverse)
- library(caTools) — for partitioning data
- library(parallel) — for parallelizing the process of generating neural nets with different parameterizations
- library(ggplot2) — for beautiful and easy visualization of data (included within Tidyverse)

PokeNet

Data Preparation

After generally cleaning up the data to be properly formatted for R (e.g. turning "True" strings into TRUE booleans), I normalized all continuous data to Gaussian distributions. Categoricals as a whole presented a more difficult challenge. Dealing with binary variables was easy enough — I encoded independent binary variables as 1 and -1, and the dependent (the victor) as 0 and 1. I used a different encoding approach between independent and dependent because a resource indicated this was preferable. I attempted to use 1-of-(C-1) effects encoding for non-binary categorical variables, but after encoding them I found that they decreased the accuracy of the net's predictions, and thus removed them. Largely, this result is is not unsurprising after the quick visualizations done in the pre-analysis. And while I did initially remove the Type variable (as it did not—by itself—provide much meaningful information) I came back to this project a few months later and decided to add in a variable that expressed who was favored in the type matchup (Type1 and Type2 are still absent from the training data). I did see a 1-2% increase in performance after adding in the type encoding + matchup variables, but this is offset by a large increasing in computation time.

Neural Net Generation

I used the neuralnet package in R, which utilizes resilient backpropagation (Riedmiller, 1994). I arbitrarily chose to use 3 hidden layers. I put 20 nodes in each hidden layer as it was a round number that was slightly more than my number of input variables. As I experimented with training set size, I found that decreasing returns warranted limiting the input training size to ~ 5000 battles. Because this is only 1/10 of the datathat I had at my disposal, this mean that I could extensively test my neural nets.
Because the ratio of my training to testing was 1:9 and yielded ~ 92% accuracy, I am confident that the degree of overfitting is not significant.

Results

Neural Net with training size of 1000 took: 1.204 seconds  
  Accuracy: 89.52%  
Neural Net with training size of 2000 took: 7.866 seconds  
  Accuracy: 91.26%  
Neural Net with training size of 3000 took: 45.632 seconds  
  Accuracy: 92.02%  
Neural Net with training size of 4000 took: 48.838 seconds  
  Accuracy: 91.86%  
Neural Net with training size of 5000 took: 201.124 seconds  
  Accuracy: 91.74%  
Neural Net with training size of 6000 took: 857.779 seconds  
  Accuracy: 92.7%  
Neural Net with training size of 7000 took: 432.797 seconds  
  Accuracy: 93.26%

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
Visualizing		Visualizing
pokemon-combat		pokemon-combat
results		results
README.md		README.md
index.html		index.html
poke_neural_net.r		poke_neural_net.r

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Visualizing

Visualizing

pokemon-combat

pokemon-combat

results

results

README.md

README.md

index.html

index.html

poke_neural_net.r

poke_neural_net.r

Repository files navigation

Pokémon Data Analysis

Data

Pre-Analysis

Tools

PokeNet

Data Preparation

Neural Net Generation

Results

About

Releases

Packages

Languages

MatthewWolff/PokeNet

Folders and files

Latest commit

History

Repository files navigation

Pokémon Data Analysis

Data

Pre-Analysis

Tools

PokeNet

Data Preparation

Neural Net Generation

Results

About

Topics

Resources

Stars

Watchers

Forks

Languages