rank-ordered-logit-model

An unpolished R implementation of Glickman and Hennessys' A stochastic rank ordered logit model for rating multi-competitor games and sports.

Context

For a side project I was working on, I was trying to develop a way to measure the strength of NCAA cross-country runners (a sport where times are difficult to compare because of varying race courses and conditions) I found Glickman and Hennessys' paper and implemented the algorithm they described on my own dataset - a collection of over 6,500 NCAA cross-country race results and over one million individual performances. The size of the data required me to optimize and parallelize the algorithm and run the code on a high compute Google Cloud server.

At fist, the algorithm looked promising on a small test dataset of cross-country data. Unfortunately, when I scaled the algorithm to the complete dataset the results were nearly meaningless. I suspect this is the case because of the extreme variability of cross-country running races. Glickman and Hennessy created the algorithm for Olympic level downhill skiing where the results are much more consistent than cross-country running (because of the variability in strength of athletes, type of courses, and length of races).

In the future, I will clean up the codebase and create an R package. If you are trying to implement the algorithm yourself, please reach out.

Overview of Codebase

For reading, exploring, manipulating, and transforming the data I used the packages readr, ggplot2, dplyr, and Matrix. For testing and optimizing the data I used packages RUnit and lineprof. For parallelization of the code I used parallel, doMC, and foreach.

Here is an overview of the files above in the order that I developed them:

Step One - understanding the algorithm and getting something working

small-test.R : Prototype of the algorithm on a small dataset in order to understand the paper and test its viability

Step Two - Attempting to scale the algorithm to larger dataset

large-model.R : The controller of the process. Feed in data, preprocess, run algorithm, save results.
preprocess-data.R : First pass at a program that transform the raw data into the matrix format detailed in the paper
newton-raphson.R : First pass at the algorithm detailed in section A of the paper, which is used to find posterior mode of theta

Step Three - Optimizing and parallelizing the algorithm

large-model-optimize.R : Optimized/parallelized version of the controller
preprocess-data-optimize.R : Optimized/parallelized version of preprocess
newton-raphson-optimize.R : Optimized/parallelized version of Newton Raphson

Step Four - Unit testing the algorithm and understanding the results

unit-test.R : Test results compared to hand derived results
unit-test-2.R : Test algorithm on dummy data
double-check-preprocess.R : I don't remember what this is :s
large-model-optimize.R : I don't remember what this is :/
parallel-test.R : Used to test/understand parallelization
pop-explor.R : Used to understand results

For questions and help reach out to Julian on Twitter @jdegrootlutzner. The project is released under the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
double-check-large-model.R		double-check-large-model.R
double-check-preprocess.R		double-check-preprocess.R
large-model-optimize.R		large-model-optimize.R
large-model.R		large-model.R
newton-raphson-optimize.R		newton-raphson-optimize.R
newton-raphson.R		newton-raphson.R
parallel-test.R		parallel-test.R
pop-explor.R		pop-explor.R
preprocess-data-optimize.R		preprocess-data-optimize.R
preprocess-data.R		preprocess-data.R
small-test.R		small-test.R
unit-test-2.R		unit-test-2.R
unit-test.R		unit-test.R

License

jdegrootlutzner/rank-ordered-logit-model

Folders and files

Latest commit

History

Repository files navigation

rank-ordered-logit-model

Context

Overview of Codebase

About

Resources

License

Stars

Watchers

Forks

Languages