Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contextual Bandit vowpal_wabbit training dataset validation #4634

Open
pallavi080596 opened this issue Aug 29, 2023 · 2 comments
Open

Contextual Bandit vowpal_wabbit training dataset validation #4634

pallavi080596 opened this issue Aug 29, 2023 · 2 comments
Labels

Comments

@pallavi080596
Copy link

I am currently using the Vowpal Wabbit package in order to implement a Contextual Bandit use case.
My use case is to provide categories(L1/L2/L3/L4/L5) considered action here with personalized ranking to the user on the basis of context like:

  1. recent_searched_categories
  2. clicked_categories
  3. type_of_user = daily/monthly/weekly
  4. age={agebracket} 1,2,3
  5. gender=male/female
  6. tier=tier1/2/3/4

I have simulated a cost function and learned online on the basis of cost and action chosen using --cb_explore_adf -q UA param.

Sample Dataset:

shared |User user=Anna time_of_day=monthly gender=female age=3 |clicked_cats clicked_cats_1=L1 clicked_cats_2=L4 |recent_cats recent_cats_1=L2 recent_cats_2=L4
|Action category=L1 
0:-0.3:0.19765689674531078 |Action category=L2 
|Action category=L2 
|Action category=L3 
|Action category=L4 
|Action category=L5 
shared |User user=Tom time_of_day=weekly gender=male age=2 |clicked_cats clicked_cats_1=L2 clicked_cats_2=L3 |recent_cats recent_cats_1=L1 recent_cats_2=L4
|Action category=L1 
|Action category=L2 
0:-0.7:0.21600767970085144 |Action category=L3 
|Action category=L3 
|Action category=L4 
|Action category=L5 
shared |User user=Rohan time_of_day=daily gender=male age=1 |clicked_cats clicked_cats_1=L4 clicked_cats_2=L5 |recent_cats recent_cats_1=L1 recent_cats_2=L2
|Action category=L1 
|Action category=L2 
|Action category=L3 
0:-0.7:0.20174514633095228 |Action category=L4 
|Action category=L4 
|Action category=L5 

My question here is:

  1. Is the data format mentioned above is correct? If not how should we create an input training dataset to learn the model.
  2. Please suggest what type of algorithms we can use for the above use case for exploring as well as optimizing the probabilities for all the categories on the basis of context and how to validate the performance of the algorithm.
@ataymano
Copy link
Member

Hi,

  1. Not sure if this is github formatting issue, but there should be empty line between last action of previous event and shared features of next one:
shared |User user=Anna time_of_day=monthly gender=female age=3 |clicked_cats clicked_cats_1=L1 clicked_cats_2=L4 |recent_cats recent_cats_1=L2 recent_cats_2=L4
|Action category=L1 
0:-0.3:0.19765689674531078 |Action category=L2 
|Action category=L2 
|Action category=L3 
|Action category=L4 
|Action category=L5 

shared |User user=Tom time_of_day=weekly gender=male age=2 |clicked_cats clicked_cats_1=L2 clicked_cats_2=L3 |recent_cats recent_cats_1=L1 recent_cats_2=L4
|Action category=L1 
...

Otherwise seems correct.
2. Seems like adding cA and rA (clicked_cats * Actions and recent_cats * Actions) interactions should be useful here ("-q UA cA rA")

@pallavi080596
Copy link
Author

pallavi080596 commented Aug 31, 2023

Hi @ataymano, Thank you for the response.
Also, can you guide me here on which kind of algorithm works best for this above use case (softmax, RND, epsilon-greedy)?
I need ranking for the categories(can consider probabilities from the model) on the basis of recent categories & clicked categories affinity to the particular user. (My reward function will depend on these two).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants