Skip to content

Contextual Bandits with Large Action Spaces (LAS)

olgavrou edited this page Mar 2, 2023 · 9 revisions

What: Contextual Bandits (CB) with many actions to explore over (> 50). Combine with epsilon-greedy or squarecb exploration.

Publication: https://proceedings.mlr.press/v162/zhu22b.html

Large action spaces is an algorithm that lets exploration happen efficiently when there are multiple actions in a contextual bandit dataset. The main idea behind it is to eliminate actions that are similar and explore only over the most diverse actions in the dataset.

Once the redundant actions are eliminated, the remaining actions are processed by either epsilon-greedy or squarecb and the final result will be a probability mass function over the chosen actions. The redundant actions will have zero probabilities.

The way actions are eliminated is based on two main methods:

  1. Randomized truncated singular value decomposition over the sparse matrix of the actions representations, in order to reduce the dimensionality of the actions representation (https://arxiv.org/abs/0909.4061)
  2. Apply c-approximate barycentric spanner on the left singular vectors produced after the SVD (Awerbuch & Kleinberg STOC'04: https://www.cs.cornell.edu/~rdk/papers/OLSP.pdf)

Available reduction options

  • max_actions <max_actions> gives an upper bound to how many actions will be explored over. The resulting actions with non-zero probabilities might be less than max_actions if the available actions are very similar. Otherwise they will be either max_actions or max_actions + 1 in the case that the predicted action happened to be filtered out by the algorithm.
  • exploration algorithm: LAS is a filtering reduction can be combined with any available exploration algorithm although the paper was written with squarecb in mind

Advanced options:

  • spanner_c <c> parameter for computing the c-approximate spanner (please consult the paper for details)
  • thread_pool_size <tps> default set to (number_of_available_hw_threads - 1) / 2 defines how many threads will be used during the execution of the algorithm
  • las_hint_explicit_simd enables faster SIMD implementations if the architecture supports them. This only works for quadratic interactions on x86 Linux for now

Intuition of LAS algorithm

Lets say we have a very small example consisting of 5 actions

 | math:2.0 music:0.5 sports:0.1 science:4.0 computers:0.1 literature:0 social_media:0 dancing:0 news:0
 | math:0 music:1.5 sports:0 science:0 computers:0 literature:2.0 social_media:0 dancing:2.0 news:0
 | math:0 music:1.5 sports:4.0 science:0 computers:0 literature:2.0 social_media:4.0 dancing:2.0 news:4.0
 | math:0 music:1.5 sports:0 science:0 computers:0 literature:2.0 social_media:4.0 dancing:2.0 news:0
 | math:0 music:1.5 sports:0 science:0 computers:0 literature:2.0 social_media:4.0 dancing:2.0 news:0

Each action is described by some characteristics, some are stronger that others in specific actions. If we replace each categorical feature with an index we can get VW's actions representation matrix in a dense format representation from VW

math -> 0, music -> 1 , sports -> 2, etc

We can represent this as a matrix where each row is an action

math music sports science computers literature social_media dancing news
2.0 0.5 0.1 4.0 0.1 0 0 0 0
0 1.5 0 0 0 2.0 0 2.0 0
0 1.5 4.0 0 0 2.0 4.0 2.0 4.0
0 1.5 0 0 0 2.0 4.0 2.0 0
0 1.5 0 0 0 2.0 4.0 2.0 0

Just by looking at the actions represented this way we can see that the first action is quite distinct and that the following 4 actions have stronger similarities (the last two actions are duplicates). We can also see that from the 4 similar actions, action 3 is the one that stands out next.

If we preform truncated randomized SVD on A, setting the reduced dimension to d = 2 (--max_actions 2) we can examine the singular values and the left singular vectors. We expect the left singular vectors to hold the "essence" of the original matrix but in a reduced dimension space

dimension 1 dimension 2
-0.638989 0.0765502
0.0988239 0.51184
0.562224 0.576712
-0.364576 0.446969
-0.364576 0.446969

and singular values: 19.2866, 5.39217, 1.17257e-07, 4.79044e-08, 0

We can see that we have two large and distinct singular values and then that the remaining three are quite small and similar. This indicates that the original matrix has two dominant components, and that we can also state that it makes sense to select two actions since any more that we select are not going to add much information since any more actions choosen after the first two are going to be similar

The resulting left singular vector matrix is will then be fed to the spanner algorithm. The purpose of the spanner algorithm is to select the d actions that are more diverse and thus increase the volume of the final matrix (volume of final matrix defined by the abs value of the matrix norm). In the above case the two (d = 2) actions that do that are the first and the third actions

Spanner result:

dimension 1 dimension 2
-0.638989 0.0765502
0.562224 0.576712

The final pmf that VW will return will be spread over the resulting two actions. If we save the above input to a file and run VW with large action spaces (--cb_explore_adf --large_action_space --max_actions 2 --noconstant --predictions predfile.txt)

cat predfile.txt

0:0.5,2:0.5,1:0,3:0,4:0

we can see that the first and third actions (indexed starting at zero) are given a probability and that the rest of the actions are given a zero probability.

Note: this example is meant to give an intuitive idea of how the LAS algorithm works and is not meant to be completely precise. The output numbers might change as the algorithm evolves and improves. Also, the singular values and left singular vectors were gained by adding print statements to the code in order to write this wiki page and are not available in general

Clone this wiki locally