Skip to content

jdegrootlutzner/rank-ordered-logit-model

Repository files navigation

rank-ordered-logit-model

An unpolished R implementation of Glickman and Hennessys' A stochastic rank ordered logit model for rating multi-competitor games and sports.

Context

For a side project I was working on, I was trying to develop a way to measure the strength of NCAA cross-country runners (a sport where times are difficult to compare because of varying race courses and conditions) I found Glickman and Hennessys' paper and implemented the algorithm they described on my own dataset - a collection of over 6,500 NCAA cross-country race results and over one million individual performances. The size of the data required me to optimize and parallelize the algorithm and run the code on a high compute Google Cloud server.

At fist, the algorithm looked promising on a small test dataset of cross-country data. Unfortunately, when I scaled the algorithm to the complete dataset the results were nearly meaningless. I suspect this is the case because of the extreme variability of cross-country running races. Glickman and Hennessy created the algorithm for Olympic level downhill skiing where the results are much more consistent than cross-country running (because of the variability in strength of athletes, type of courses, and length of races).

In the future, I will clean up the codebase and create an R package. If you are trying to implement the algorithm yourself, please reach out.

Overview of Codebase

For reading, exploring, manipulating, and transforming the data I used the packages readr, ggplot2, dplyr, and Matrix. For testing and optimizing the data I used packages RUnit and lineprof. For parallelization of the code I used parallel, doMC, and foreach.

Here is an overview of the files above in the order that I developed them:

Step One - understanding the algorithm and getting something working

  • small-test.R : Prototype of the algorithm on a small dataset in order to understand the paper and test its viability

Step Two - Attempting to scale the algorithm to larger dataset

  • large-model.R : The controller of the process. Feed in data, preprocess, run algorithm, save results.
  • preprocess-data.R : First pass at a program that transform the raw data into the matrix format detailed in the paper
  • newton-raphson.R : First pass at the algorithm detailed in section A of the paper, which is used to find posterior mode of theta

Step Three - Optimizing and parallelizing the algorithm

  • large-model-optimize.R : Optimized/parallelized version of the controller
  • preprocess-data-optimize.R : Optimized/parallelized version of preprocess
  • newton-raphson-optimize.R : Optimized/parallelized version of Newton Raphson

Step Four - Unit testing the algorithm and understanding the results

  • unit-test.R : Test results compared to hand derived results
  • unit-test-2.R : Test algorithm on dummy data
  • double-check-preprocess.R : I don't remember what this is :s
  • large-model-optimize.R : I don't remember what this is :/
  • parallel-test.R : Used to test/understand parallelization
  • pop-explor.R : Used to understand results

For questions and help reach out to Julian on Twitter @jdegrootlutzner. The project is released under the MIT license.

About

R implementation of Glickman and Hennessys' "A stochastic rank ordered logit model for rating multi-competitor games and sports" (2015)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages