Unexpected predictions when training ccb-model #4690

gronilsen · 2024-04-04T08:43:53Z

When training a ccb-model, why are the actions in current predict always the same as the actions in current label in the log-output? We further observe that in the predictions the first action in each slot is not necessarily the one with the highest probability, but rather the observed action. Also, the action with the highest probability can be found (repeatedly) among the available actions for the next slots.
Is this the expected behaviour?

Our expectation would be that the action with the highest probability would be ordered first for any given slot, and that this action would then be unavailable for the remaining slots.

Below is a minimal, reproducible example, but we experience the same behaviour in our production model.

Log-output:

predictions = ccb_predictions.txt
using no cache
Reading datafile = ccb_data.txt
num sources = 1
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
cb_type = dr
Enabled reductions: gd, generate_interactions, scorer-identity, csoaa_ldf-rank, cb_adf, cb_explore_adf_greedy, cb_sample, shared_feature_merger, ccb_explore_adf
Input label = CCB
Output pred = DECISION_PROBS
average  since         example        example        current        current  current
loss     last          counter         weight          label        predict features
-1.00000 -1.00000            1            1.0    1:0,3:0,...          1,3,0       54
-1.50000 -2.00000            2            2.0    1:0,3:0,...          1,3,0       54
-1.66666 -2.00000            3            3.0    1:0,3:0,...          1,3,0       54

finished run
number of examples = 3
weighted example sum = 3.000000
weighted label sum = 0.000000
average loss = -1.666667
total feature number = 162

Predictions (ccb_predictions.txt)

1:0.25,0:0.25,2:0.25,3:0.25
3:0.333333,2:0.333333,0:0.333333
0:0.5,2:0.5

1:0,0:1,3:0,2:0
3:0,0:1,2:0
0:1,2:0

1:0,2:0,3:0,0:1
3:0,2:0,0:1
0:1,2:0

In example 2, action 0 has the highest probability in the first slot but this is not reflected by the order of the actions (it is not ordered first). Action 0 also remains an available action for slots 2 and 3.
(The same holds for example 3 as well).

How to reproduce

ccb_data.txt:

ccb shared |User gen='f'
ccb action |Action contentId='a'
ccb action |Action contentId='b'
ccb action |Action contentId='c'
ccb action |Action contentId='d'
ccb slot 1:0:0.25,0:0.25,2:0.25,3:0.25 |
ccb slot 3:0:0.333333333,0:0.333333333,2:0.333333333 |
ccb slot 0:-1:0.5,2:0.5 |

ccb shared |User gen='f'
ccb action |Action contentId='a'
ccb action |Action contentId='b'
ccb action |Action contentId='c'
ccb action |Action contentId='d'
ccb slot 1:0:0.25,0:0.25,2:0.25,3:0.25 |
ccb slot 3:0:0.333333333,0:0.333333333,2:0.333333333 |
ccb slot 0:-1:0.5,2:0.5 |

ccb shared |User gen='m'
ccb action |Action contentId='a'
ccb action |Action contentId='b'
ccb action |Action contentId='c'
ccb action |Action contentId='d'
ccb slot 1:0:0.25,0:0.25,2:0.25,3:0.25 |
ccb slot 3:0:0.333333333,0:0.333333333,2:0.333333333 |
ccb slot 0:-1:0.5,2:0.5 |

model_train = Workspace(f'-d ccb_data.txt --ccb_explore_adf --cb_type dr --progress 1 --all_slots_loss --predictions ccb_predictions.txt --epsilon 0', quiet=False) 
model_train.finish()

OS
MacOs

Language
Python

Thank you, hope you can help us clarify this. Please let us know if you need more information.

The text was updated successfully, but these errors were encountered:

JohnLangford · 2024-04-11T15:40:06Z

This may be a partial answer:

Inside CCB, the system needs to actually choose an action according to its distribution in order to later be able to do a CB update for a slot. Hence, the distribution over the chosen actions needs to be broader than just the most likely action.

jackgerrits · 2024-04-11T15:54:41Z

When training for CCB the chosen actions will be the same as the labelled actions (see here)

In order for the sampling/exploration to behave as you are looking for you need to run VW in test mode when the data has labels. (-t)

gronilsen · 2024-04-12T09:07:01Z

When training for CCB the chosen actions will be the same as the labelled actions (see here)

Apologies if we are missing something fundamental, but it is not clear to us why this is the expected behaviour when training. Wouldn't you need the actual predictions to update the model weights and to calculate the progressive validation loss on each example? That is, how do you calculate the reported progressive validation loss when the chosen actions are the same as the labelled actions (and not the actual predictions)?

jackgerrits · 2024-04-12T17:05:04Z

I'm not sure I follow - what would the difference between chosen actions and actual predictions be? The prediction is the set of chosen actions.

For how the progressive validation loss is calculated see here:

vowpal_wabbit/vowpalwabbit/core/src/reductions/conditional_contextual_bandit.cc

Line 564 in 128fad3

    
           loss += l * preds[i][VW::details::TOP_ACTION_INDEX].score * ec_seq[VW::details::SHARED_EX_INDEX]->weight;

JohnLangford · 2024-04-14T14:07:43Z

For contextual bandit training generally, you do not need the "actual predictions" to update the model---instead you need the chosen action and the probability with which that action was taken. Right? See for example the tutorial with Alekh (http://hunch.net/~rwil ).

gronilsen · 2024-04-15T13:13:25Z

I'm not sure I follow - what would the difference between chosen actions and actual predictions be? The prediction is the set of chosen actions.

Maybe we're just talking past each other - this is what we would think as well (but we were confused by your previous response where you said that chosen actions will be the same as the labelled actions).

To clarify, our main confusion is about the reported progressive validation loss. Our understanding is that the pv-loss is calculated through training and testing progressively on each example. But if we train on the first two examples in the data set and predict on the third manually, we do not get the same loss as the pv-loss seen in the log-output from training (see log-output from the first question). That is, in the training output we get a loss of -2 for the third example, while we get a loss of 0 in the manual prediction below. Is this what you would expect? If so, what explains the difference?

We thought this might be related to the concerns we adressed in the first question, with current predict being the same as the current label, but maybe there is another explanation?

Thank you again for your time!

Reproducible example:

model_train_first2 = Workspace(f'-d ccb_data_first2.txt --ccb_explore_adf --cb_type dr --progress 1 --all_slots_loss --epsilon 0', quiet=False) 
model_train_first2.finish()

ex3 = '''ccb shared |User gen='m'
ccb action |Action contentId='a'
ccb action |Action contentId='b'
ccb action |Action contentId='c'
ccb action |Action contentId='d'
ccb slot 1:0:0.25,0:0.25,2:0.25,3:0.25 |
ccb slot 3:0:0.333333333,0:0.333333333,2:0.333333333 |
ccb slot 0:-1:0.5,2:0.5 |
'''

model_train_first2.predict(ex3)

Output:

-1.00000 0.000000            3            3.0    1:0,3:0,...          0,2,3       54
[[(0, 1.0), (2, 0.0), (3, 0.0), (1, 0.0)], [(2, 1.0), (1, 0.0), (3, 0.0)], [(3, 1.0), (1, 0.0)]]

The second column indicates the loss (0.000000). The current predict values (0,2,3) also differ from those seen in the log output from training.

ccb_data_first2.txt:

ccb shared |User gen='f'
ccb action |Action contentId='a'
ccb action |Action contentId='b'
ccb action |Action contentId='c'
ccb action |Action contentId='d'
ccb slot 1:0:0.25,0:0.25,2:0.25,3:0.25 |
ccb slot 3:0:0.333333333,0:0.333333333,2:0.333333333 |
ccb slot 0:-1:0.5,2:0.5 |

ccb shared |User gen='f'
ccb action |Action contentId='a'
ccb action |Action contentId='b'
ccb action |Action contentId='c'
ccb action |Action contentId='d'
ccb slot 1:0:0.25,0:0.25,2:0.25,3:0.25 |
ccb slot 3:0:0.333333333,0:0.333333333,2:0.333333333 |
ccb slot 0:-1:0.5,2:0.5 |

JohnLangford · 2024-05-23T15:59:23Z

I have this reproducing on the commandline, and it does seem like an inconsistency in reporting. I'll try to find some time with Jack to work through the precise source.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected predictions when training ccb-model #4690

Unexpected predictions when training ccb-model #4690

gronilsen commented Apr 4, 2024 •

edited

JohnLangford commented Apr 11, 2024

jackgerrits commented Apr 11, 2024

gronilsen commented Apr 12, 2024

jackgerrits commented Apr 12, 2024

JohnLangford commented Apr 14, 2024

gronilsen commented Apr 15, 2024

JohnLangford commented May 23, 2024

Unexpected predictions when training ccb-model #4690

Unexpected predictions when training ccb-model #4690

Comments

gronilsen commented Apr 4, 2024 • edited

JohnLangford commented Apr 11, 2024

jackgerrits commented Apr 11, 2024

gronilsen commented Apr 12, 2024

jackgerrits commented Apr 12, 2024

JohnLangford commented Apr 14, 2024

gronilsen commented Apr 15, 2024

JohnLangford commented May 23, 2024

gronilsen commented Apr 4, 2024 •

edited