Skip to content

Conditional Contextual Bandit

Griffin Bassman edited this page Feb 10, 2022 · 35 revisions

Conditional Contextual Bandit (CCB) is an extension over Contextual Bandit (CB), where there are multiple slots in which an action can be chosen. There is a shared context, as well as features for each action and slot. The CCB reduction (ccb_explore_adf) calls into cb_sample (see Sampling) and then cb_explore_adf, and so it essentially reduces into sequential CB operations. There is an id assigned to each slot automatically in a reserved namespace. Interactions are then added for every namespace and interaction with this reserved namespace to learn the slot interactions. Rewards can be specified for any slot.

CCB provides several improvements:

  • Ability to learn from any slot, not just the top action
  • Diversity in predictions
  • Richer learning around slot dependent situations

Slots

Slots can be thought of as separate instances of CB within a given action space which run sequentially. By sequentially, we mean that slot 1 can choose any action, slot 2 can choose any action except that chosen in slot 1, slot 3 can choose any action except those selected in slots 1 and 2, and so on. Standard CB will select the best action and return a cost and probability associated with that action. In this sense, CB is equivalent to CCB with only 1 slot.

Slots can be useful for ranking or selecting the best n actions in a given space. Say we are given 20 actions and we want to find the best 4. We can use 4 slots which will output the 4 best actions in order -- ranking the best 4 of 20 possible actions.

Slots can also be specialized to only consider a subset of the action space. For instance, in the example above, if we wanted to find the top 2 actions of actions 1 through 10, and the top 2 actions of actions 11 through 20, we can specify which actions are available to each slot. Here we would explicitly specify that actions 1,2,3,4,5,6,7,8,9,10 are available to slots 1 and 2, and actions 11,12,13,14,15,16,17,18,19,20 are available to slots 3 and 4, and we would get the desired result.

Finally slots can be specialized with slots-specific features, which are only present in that slot. This can be useful if there is some state associated with a slot which differentiates it from other standard slots. This specialization should not be used if you are simply trying to rank or get the top n actions.

Usage

vw --ccb_explore_adf -d <data_file>

To use CCB, invoke VW with --ccb_explore_adf. All of the normal parameters are valid such as data file and prediction output file. The data file should be in the input format described below.

Sampling

Since there are several calls to CB, in order for exploration to work each CB result must be sampled between each call. This is done automatically by the cb_sample reduction by potentially swapping the top action based on the probability density function produced by the underlying cb_explore_adf call. Since exploration is done for every slot it is recommended to divide your epsilon by the number of slots in order to maintain a similar exploration amount.(epsilon/num_slots)

Label Type

The label type of CCB is CCB::label. It contains the example type as one of shared, action, slot. An outcome if it was supplied (for labelled examples) and explicit_included_actions. The outcome is the cost associated with this example and all action probability pairs for this slot. You can see that this information directly corresponds to the information encoded in the text format section.

struct conditional_contexual_bandit_outcome
{
    float cost;
    ACTION_SCORE::action_scores probabilities;
};

enum example_type : uint8_t
{
    unset = 0,
    shared = 1,
    action = 2,
    slot = 3
};

struct label {
    example_type type;
    conditional_contexual_bandit_outcome* outcome;
    v_array<uint32_t> explicit_included_actions;
};

Prediction type

The prediction type for CCB is CCB::decision_scores_t, defined as follows:

typedef v_array<ACTION_SCORE::action_scores> decision_scores_t;

This prediction contains an array of action scores for every slot. Therefore, the chosen actions are the items in index 0 of every array. The rest of the contents of each of these arrays are the results of each CB call. The probability values can be used to determine if the top action is an explore or exploit action by observing if it is the largest and unique probability or a smaller and duplicated probability.

Input Format

  • For all input formats at least one feature must be provided per component (shared, action, slot)

VW text format

CCB format is a multi line example format with 3 different example/line types. Lines are identified by explicit types as part of the label. This is different to the previous implicit action example type.

ccb shared | ...
ccb action | ...
ccb slot [<chosen_action>:<cost>:<probability>,<action>:<probability>,...] [action_ids_to_include,...] | ...
  • Both additional sections in the slot label are optional
  • If action_ids_to_include is excluded then all actions are implicitly included
    • This is currently unsupported
  • Action ids are zero indexed
  • The list of action probability pairs in the first section is optional
    • If included, the entire collection of probabilities must sum to 1.0
  • Test labels omit the entire chosen_action:cost:probabilitysection

Note: as a single example can span multiple lines (hence the characterization of multiline example) it is important to leave empty lines between these examples. If reading from a file, make sure your file ends with an empty line.

Example

ccb shared | s_1 s_2
ccb action | a:1 b:1 c:1
ccb action | a:0.5 b:2 c:1
ccb action | a:0.5 
ccb action | c:1
ccb slot | d:4
ccb slot 1:0.8:0.8,0:0.2 0,1,3 | d:7
Breakdown of Example

Line 1:

ccb shared | s_1 s_2

This is a shared context which adds common features to each action, similar to standard CB. Note that these features are appended to each action, and not to each slot. The first five lines of this example:

ccb shared | s_1 s_2
ccb action | a:1 b:1 c:1
ccb action | a:0.5 b:2 c:1
ccb action | a:0.5 
ccb action | c:1

is equivalent to:

ccb action | a:1 b:1 c:1 s_1 s_2
ccb action | a:0.5 b:2 c:1 s_1 s_2
ccb action | a:0.5 s_1 s_2
ccb action | c:1 s_1 s_2

where the features s_1 and s_2 have been appended to each action (and each have a default value of 1.0).

Lines 2-5:

ccb action | a:1 b:1 c:1
ccb action | a:0.5 b:2 c:1
ccb action | a:0.5 
ccb action | c:1

These are the 4 actions which are present in this CCB example. Each has its own set of features, and is equivalent to a standard CB example.

Line 6:

ccb slot | d:4

This is an unlabeled slot with one slot-specific feature d:4. Since it is the first slot, it will have access to all 4 actions in the example. Since it is unlabeled, it can be used only for prediction and not learning. It will select and output the best action, given the slots-specific feature. In order to get the best action without any additional information we could simply use ccb slot | instead.

Line 7:

ccb slot 1:0.8:0.8,0:0.2 0,1,3 | d:7

This is a labeled slot with even more specification than the first slot. The label here is 1:0.8:0.8,0:0.2. This specifies that action 1 was selected and had a probability of 0.8 and a cost of 0.8. Additionally action 0 was not selected, but had a probability of 0.2. The part 0,1,3 specifies that this slot only has access to actions 0, 1, and 3 (thus cannot choose action 2 under any circumstances). Since this is the second slot, it also automatically does not have access to the action selected in slot 1. So if slot 1 had selected action 3 then this slot would only have access to actions 0 and 1. Finally d:7 represents a slot-specific feature.

JSON format

The JSON format is identical to the CB format, with the addition of _slots field. The _slots field contains all of the slot information similar to _multi for actions. It is an array of objects, where each object is one slot. _inc can be supplied to specify the explicit included actions.

Note: Labels are not currently supported with the JSON format, using the below DSJSON format to supply a label.

Example

{
    "GUser": {
      "shared_feature": "feature"
    },
    "_multi": [
      {
        "TAction": {
          "feature1": 3.0,
          "feature2": "name1"
        }
      },
      {
        "TAction": {
          "feature1": 3.0,
          "feature2": "name1"
        }
      },
      {
        "TAction": {
          "feature1": 3.0,
          "feature2": "name1"
        }
      }
    ],
    "_slots": [
      {
        "size": "small",
        "_inc": [0, 2]
      },
      {
        "size": "large"
      }
    ]
}

DSJSON format

The DSJSON format for CCB is also similar to CB. The context field, c, is the same as for CB, where it is a valid object in VW JSON format. Therefore the slots are defined in the context field. The _outcomes field contains an object per slot. This specifies the cost associated with this slot, the outcomes reported for this slot as well as either an array or single value for both actions and probabilities.

Example

{
  "Timestamp": "2019-08-27T12:45:53.6300000Z",
  "Version": "1",
  "c": {
    "GUser": {
      "shared_feature": "feature"
    },
    "_multi": [
      {
        "TAction": {
          "feature1": 3.0,
          "feature2": "name1"
        }
      },
      {
        "TAction": {
          "feature1": 3.0,
          "feature2": "name1"
        }
      },
      {
        "TAction": {
          "feature1": 3.0,
          "feature2": "name1"
        }
      }
    ],
    "_slots": [
      {
        "size": "small",
        "_inc": [0, 2]
      },
      {
        "size": "large"
      }
    ]
  },
  "_outcomes": [
    {
      "_id": "62ddd79e-4d75-4c64-94f1-a5e13a75c2e4",
      "_label_cost": 0,
      "_a": [2, 0],
      "_p": [0.9, 0.1],
      "_o": []
    },
    {
      "_id": "042661c4-d433-4b05-83d6-d51a2d1c68be",
      "_label_cost": 0,
      "_a": [1, 0],
      "_p": [0.1, 0.9],
      "_o": [-1.0, 0.0]
    }
  ],
  "VWState": {
    "m": "da63c529-018b-44b1-ad0f-c2b13056832c/195fc8ed-224f-471a-90c4-d3e60b336f8f"
  }
}
Clone this wiki locally