election-predictions

Index

Summary

The following project accomplishes two goals:

Predicting the 2016 US election results by county with supervised machine learning in R.
Mining interesting association rules that relate to demographics and voting preference in R.

Three supervised machine learning models are used to predict election results based on demographics: K-Nearest Neighbor, Decision Trees, and Artificial Neural Networks. The models are compared based on accuracy and precision.

File Directory

data - contains three data sets used in analysis (taken from kaggle, referenced in the credits):
         a. county_facts.csv - Demographic breakdown of each county.
         b. county_facts_dictionary.csv - Dictionary to decode variable names in county_facts.csv.
         c. pres16results.csv - Results of the 2016 election by county.
images - contains vizualizations:
         a. decision_tree.png - Decision tree created from modelling process.
         b. model_comparison.png - Comparison of 3 classification models used.
         c. population_trends.png - Population size by voting preference.
         d. voting_trends.png - Voting trends by top 5 normalized demographics.
         e. democrat_arules.png - Scatterplot of democratic association rules by support and confidence.
         f. republican_arules.png - Scatterplot of republican association rules by support and confidence.
         g. democrats_grid.png - Color grid of democratic association rules.
         h. republican_grid.png - Color grid of republican association rules.
classification - contains classification files that predict election outcome based off demographics:
a. classification.Rmd - R Markdown detailing the classification process, from data cleaning to model creation.
b. classification.pdf - PDF that shows R code and the outputted results, for easy viewing.
association_rules - contains association rules files:
a. association_rules.Rmd - R Markdown to mine rules that relate to demographics and voting preference.
b. association_rules.pdf - PDF that shows R code and the outputted results, for easy viewing.
results.pdf - A full write-up comparing classification and association rules mining in R vs SAS.

Language and Packages Used

R is used for all model building - the results are compared in R vs SAS.

The following packages are used:

#list of packages used
packages <- c("dplyr", "tidyr", "ggplot2", "class", "rpart", "rpart.plot", "neuralnet", "arules",
            "plyr", "mltools", "arulesViz", "plotly", "RCurl")

#check to see if package is already installed, if not, install
for(p in packages){
if(!require(p, character.only = TRUE)) {
  install.packages(p)
  library(p, character.only = TRUE)
} 
}

Credits

Would like to thank Ben Hammer for the county_facts.csv and county_facts_dictionary.csv datasets, which were taken off Kaggle.
Would like to thank Steve Palley for the pres16results.csv dataset, which was taken off Kaggle.

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
association_rules		association_rules
classification		classification
data		data
images		images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
results.pdf		results.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

association_rules

association_rules

classification

classification

data

data

images

images

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

results.pdf

results.pdf

Repository files navigation

election-predictions

Index

Summary

File Directory

Language and Packages Used

Credits

License

About

Releases

Packages

License

ianjeffries/election-predictions

Folders and files

Latest commit

History

Repository files navigation

election-predictions

Index

Summary

File Directory

Language and Packages Used

Credits

License

About

Topics

Resources

License

Stars

Watchers

Forks