Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected predictions when training ccb-model #4690

Open
gronilsen opened this issue Apr 4, 2024 · 7 comments
Open

Unexpected predictions when training ccb-model #4690

gronilsen opened this issue Apr 4, 2024 · 7 comments

Comments

@gronilsen
Copy link

gronilsen commented Apr 4, 2024

When training a ccb-model, why are the actions in current predict always the same as the actions in current label in the log-output? We further observe that in the predictions the first action in each slot is not necessarily the one with the highest probability, but rather the observed action. Also, the action with the highest probability can be found (repeatedly) among the available actions for the next slots.
Is this the expected behaviour?

Our expectation would be that the action with the highest probability would be ordered first for any given slot, and that this action would then be unavailable for the remaining slots.

Below is a minimal, reproducible example, but we experience the same behaviour in our production model.

Log-output:

predictions = ccb_predictions.txt
using no cache
Reading datafile = ccb_data.txt
num sources = 1
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
cb_type = dr
Enabled reductions: gd, generate_interactions, scorer-identity, csoaa_ldf-rank, cb_adf, cb_explore_adf_greedy, cb_sample, shared_feature_merger, ccb_explore_adf
Input label = CCB
Output pred = DECISION_PROBS
average  since         example        example        current        current  current
loss     last          counter         weight          label        predict features
-1.00000 -1.00000            1            1.0    1:0,3:0,...          1,3,0       54
-1.50000 -2.00000            2            2.0    1:0,3:0,...          1,3,0       54
-1.66666 -2.00000            3            3.0    1:0,3:0,...          1,3,0       54

finished run
number of examples = 3
weighted example sum = 3.000000
weighted label sum = 0.000000
average loss = -1.666667
total feature number = 162

Predictions (ccb_predictions.txt)

1:0.25,0:0.25,2:0.25,3:0.25
3:0.333333,2:0.333333,0:0.333333
0:0.5,2:0.5

1:0,0:1,3:0,2:0
3:0,0:1,2:0
0:1,2:0

1:0,2:0,3:0,0:1
3:0,2:0,0:1
0:1,2:0

In example 2, action 0 has the highest probability in the first slot but this is not reflected by the order of the actions (it is not ordered first). Action 0 also remains an available action for slots 2 and 3.
(The same holds for example 3 as well).

How to reproduce

ccb_data.txt:

ccb shared |User gen='f'
ccb action |Action contentId='a'
ccb action |Action contentId='b'
ccb action |Action contentId='c'
ccb action |Action contentId='d'
ccb slot 1:0:0.25,0:0.25,2:0.25,3:0.25 |
ccb slot 3:0:0.333333333,0:0.333333333,2:0.333333333 |
ccb slot 0:-1:0.5,2:0.5 |

ccb shared |User gen='f'
ccb action |Action contentId='a'
ccb action |Action contentId='b'
ccb action |Action contentId='c'
ccb action |Action contentId='d'
ccb slot 1:0:0.25,0:0.25,2:0.25,3:0.25 |
ccb slot 3:0:0.333333333,0:0.333333333,2:0.333333333 |
ccb slot 0:-1:0.5,2:0.5 |

ccb shared |User gen='m'
ccb action |Action contentId='a'
ccb action |Action contentId='b'
ccb action |Action contentId='c'
ccb action |Action contentId='d'
ccb slot 1:0:0.25,0:0.25,2:0.25,3:0.25 |
ccb slot 3:0:0.333333333,0:0.333333333,2:0.333333333 |
ccb slot 0:-1:0.5,2:0.5 |

model_train = Workspace(f'-d ccb_data.txt --ccb_explore_adf --cb_type dr --progress 1 --all_slots_loss --predictions ccb_predictions.txt --epsilon 0', quiet=False) 
model_train.finish()

OS
MacOs

Language
Python

Thank you, hope you can help us clarify this. Please let us know if you need more information.

@JohnLangford
Copy link
Member

This may be a partial answer:

Inside CCB, the system needs to actually choose an action according to its distribution in order to later be able to do a CB update for a slot. Hence, the distribution over the chosen actions needs to be broader than just the most likely action.

@jackgerrits
Copy link
Member

When training for CCB the chosen actions will be the same as the labelled actions (see here)

In order for the sampling/exploration to behave as you are looking for you need to run VW in test mode when the data has labels. (-t)

@gronilsen
Copy link
Author

When training for CCB the chosen actions will be the same as the labelled actions (see here)

Apologies if we are missing something fundamental, but it is not clear to us why this is the expected behaviour when training. Wouldn't you need the actual predictions to update the model weights and to calculate the progressive validation loss on each example? That is, how do you calculate the reported progressive validation loss when the chosen actions are the same as the labelled actions (and not the actual predictions)?

@jackgerrits
Copy link
Member

I'm not sure I follow - what would the difference between chosen actions and actual predictions be? The prediction is the set of chosen actions.

For how the progressive validation loss is calculated see here:

loss += l * preds[i][VW::details::TOP_ACTION_INDEX].score * ec_seq[VW::details::SHARED_EX_INDEX]->weight;

@JohnLangford
Copy link
Member

For contextual bandit training generally, you do not need the "actual predictions" to update the model---instead you need the chosen action and the probability with which that action was taken. Right? See for example the tutorial with Alekh (http://hunch.net/~rwil ).

@gronilsen
Copy link
Author

I'm not sure I follow - what would the difference between chosen actions and actual predictions be? The prediction is the set of chosen actions.

Maybe we're just talking past each other - this is what we would think as well (but we were confused by your previous response where you said that chosen actions will be the same as the labelled actions).

To clarify, our main confusion is about the reported progressive validation loss. Our understanding is that the pv-loss is calculated through training and testing progressively on each example. But if we train on the first two examples in the data set and predict on the third manually, we do not get the same loss as the pv-loss seen in the log-output from training (see log-output from the first question). That is, in the training output we get a loss of -2 for the third example, while we get a loss of 0 in the manual prediction below. Is this what you would expect? If so, what explains the difference?

We thought this might be related to the concerns we adressed in the first question, with current predict being the same as the current label, but maybe there is another explanation?

Thank you again for your time!

Reproducible example:

model_train_first2 = Workspace(f'-d ccb_data_first2.txt --ccb_explore_adf --cb_type dr --progress 1 --all_slots_loss --epsilon 0', quiet=False) 
model_train_first2.finish()
ex3 = '''ccb shared |User gen='m'
ccb action |Action contentId='a'
ccb action |Action contentId='b'
ccb action |Action contentId='c'
ccb action |Action contentId='d'
ccb slot 1:0:0.25,0:0.25,2:0.25,3:0.25 |
ccb slot 3:0:0.333333333,0:0.333333333,2:0.333333333 |
ccb slot 0:-1:0.5,2:0.5 |
'''

model_train_first2.predict(ex3)

Output:

-1.00000 0.000000            3            3.0    1:0,3:0,...          0,2,3       54
[[(0, 1.0), (2, 0.0), (3, 0.0), (1, 0.0)], [(2, 1.0), (1, 0.0), (3, 0.0)], [(3, 1.0), (1, 0.0)]]

The second column indicates the loss (0.000000). The current predict values (0,2,3) also differ from those seen in the log output from training.

ccb_data_first2.txt:

ccb shared |User gen='f'
ccb action |Action contentId='a'
ccb action |Action contentId='b'
ccb action |Action contentId='c'
ccb action |Action contentId='d'
ccb slot 1:0:0.25,0:0.25,2:0.25,3:0.25 |
ccb slot 3:0:0.333333333,0:0.333333333,2:0.333333333 |
ccb slot 0:-1:0.5,2:0.5 |

ccb shared |User gen='f'
ccb action |Action contentId='a'
ccb action |Action contentId='b'
ccb action |Action contentId='c'
ccb action |Action contentId='d'
ccb slot 1:0:0.25,0:0.25,2:0.25,3:0.25 |
ccb slot 3:0:0.333333333,0:0.333333333,2:0.333333333 |
ccb slot 0:-1:0.5,2:0.5 |

@JohnLangford
Copy link
Member

I have this reproducing on the commandline, and it does seem like an inconsistency in reporting. I'll try to find some time with Jack to work through the precise source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants