RecoBandit

Building recommender Systems using contextual bandit methods to address cold-start issue and online real-time learning

App 1

Thompson Sampling, Single-user Multi-product Simulation, Multi-armed Bandit

The objective of this app is to apply the bandit algorithms to recommendation problem under a simulated envrionment. Although in practice we would also use the real data, the complexity of the recommendation problem and the associated algorithmic challenges can already be revealed even in this simple setting.

Inspired by the following works:

App 2

Multi-user Multi-product Contextual Simulation, Contextual Bandit, Vowpal Wabbit

The objective of this app is to apply the contextual bandit algorithms to recommendation problem under a simulated envrionment. The recommender agent is able to quickly adapt the changing bahavior of users and change the recommendation strategy accordingly.

App 3 (next release)

Image Embeddings, Offline Learning

The objective is to recommend products and adapt the model in real-time using user's feedback using Actor-critic algorithm. Suppose, we observed users’ behavior and acquired some products they clicked on. It is fed into the Actor Network which decides what we would like to read next. It produces an ideal product embedding. It can be compared with other product embeddings to find similarities. The most matching one will be recommended to the user. The Critic helps to judge the Actor and help it find out what is wrong.

Inspired by the following works:

Blog post

App 4 (next release)

Offline Learning

The core intuition is that we couldn't just blindly apply RL algorithms in a production system out of the box. The learning period would be too costly. Instead, we need to leverage the vast amounts of offline training examples to make the algorithm perform as good as the current system before releasing into the online production environment. An agent is first given access to many offline training examples produced from a fixed policy. Then, they have access to the online system where they choose the actions.

Inspired by the following works:

What is Bandit based Recommendation?

Traditionally, the recommendation problem was considered as a simple classification or prediction problem; however, the sequential nature of the recommendation problem has been shown. Accordingly, it can be formulated as a Markov decision process (MDP) and reinforcement learning (RL) methods can be employed to solve it. In fact, recent advances in combining deep learning with traditional RL methods, i.e. deep reinforcement learning (DRL), has made it possible to apply RL to the recommendation problem with massive state and action spaces.

Use case 1: Personalized recommendations

Goal: Quickly help users find products they would like to buy

In e-commerce and other digital domains, companies frequently want to offer personalised product recommendations to users. This is hard when you don’t yet know a lot about the customer, or you don’t understand what features of a product are pertinent. With limited information about what actions to take, what their payoffs will be, and limited resources to explore the competing actions that you can take, it is hard to know what to do.

Use case 2: Online model evaluation

Goal: Compare and find the best performing recommender model

Use case 3: Personalized re-ranking

Goal: Bring the most relevant option to the top

Use case 4: Personalized feeds

Goal: Recommend a never-ending feed of items (news, products, images, music)

https://youtu.be/CgGCbmlRI3o

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
app1		app1
app2		app2
app3		app3
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

app1

app1

app2

app2

app3

app3

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

RecoBandit

App 1

App 2

App 3 (next release)

App 4 (next release)

What is Bandit based Recommendation?

Use case 1: Personalized recommendations

Use case 2: Online model evaluation

Use case 3: Personalized re-ranking

Use case 4: Personalized feeds

References

About

Languages

License

sparsh-ai/reco-bandit

Folders and files

Latest commit

History

Repository files navigation

RecoBandit

App 1

App 2

App 3 (next release)

App 4 (next release)

What is Bandit based Recommendation?

Use case 1: Personalized recommendations

Use case 2: Online model evaluation

Use case 3: Personalized re-ranking

Use case 4: Personalized feeds

References

About

Topics

Resources

License

Stars

Watchers

Forks

Languages