Skip to content

TREMA-UNH/rank-lips

Repository files navigation

Rank-Lips: Learning-to-Rank on Trec_Eval

High-performance multi-threaded list-wise learning-to-rank implementation that supports mini-batched learning. The implementation focuses on coordinate ascent to optimize for mean-average precision, which is one of the strongest models for model combination in information retrieval.

Rank-lips is designed to work with trec_eval file formats for defining runs (run format) and relevance data (qrel format). The features will be taken from the score and/or reciprocal rank of each input file. The filename of an input run (in the directory) will be used as a feature name. If you want to train a model and predict on a different test set, make sure that the input runs for test features are using exactly the sane filename. We recommend to create different directories for training and test sets.

Rank-lips is implemented in GHC Haskell, but we also provide Linux binaries. Rank-lips is released AS-IS under the BSD-3-Clause open source license.

Authors: Laura Dietz and Ben Gamari.

Official Rank-Lips website with more examples and usage instructions: http://www.cs.unh.edu/~dietz/rank-lips/

Installation

  • Build from source:

    • Install GHC 8.8.3 and cabal

    • cabal new-build rank-lips

    • Calling ./mk-cabal-bin.sh will create soft links to the binary in ./bin

  • Download statically linked binary from rank-lips wesbsite

Quick Usage

Training with 5-fold cross-validation

The features are given as a set of run-files (one run = one feature). The filename of the runfile in the $TRAIN_FEATURE_DIR is used as a feature name. When the trained model is used to predict on a different dataset, the directory of test features need to use exactly the same file names. (Hint: you can use a directory of softlinks to define the names.) The ground truth is given as a qrels format.

rank-lips train -d "$TRAIN_FEATURE_DIR" -q "$QREL" -e "$FRIENDLY_NAME" -O "$OUT_DIR" -o "$OUT_PREFIX" -z-score --feature-variant FeatScore --mini-batch-size 1000 --convergence-threshold -0.001 --train-cv

The $OUT_DIR$ will contain the following files:

Cross-validation:

  • train-run-test.run prediction with cross-validation (no leakage of test data into the training process! Use this in your research paper!)
  • train-model-fold-$k-best.json the model trained for fold $k (best model of 5 restarts)
  • train-run-fold-$k-best.run run predicted of fold's model on the folds' training data (used to compute the training MAP)

Trained on whole data set:

  • train-model-train.json model trained on all data (to be used with rank-lips predict)
  • train-run-train.run run predicted with the overall train model on the train data (used to produce the training MAP score)

See Also

Directory ./rank-lips-example for an educational example on how to use rank-lips.