Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MiniMax Algorithm? #30

Open
flaport opened this issue Nov 6, 2022 · 1 comment
Open

MiniMax Algorithm? #30

flaport opened this issue Nov 6, 2022 · 1 comment
Labels
question Further information is requested

Comments

@flaport
Copy link

flaport commented Nov 6, 2022

How would you implement a minimax q-learner with coax?

Hi there! I love the package and how accessible it is to relative newbies. The tutorials are pretty great and the accompanying videos are very helpful!

I was wondering what the best way to implement a minimax algorithm would be, would you recommend using two policies pi1 and pi2? Or is there something better suited for this?

I'd like to re-implement something like this old blogpost of mine in coax to get a better feel of the library.

Any help would be greatly appreciated :)

@flaport flaport added the question Further information is requested label Nov 6, 2022
@KristianHolsheimer
Copy link
Contributor

Hi @flaport

First of all thanks for your interest in coax!

It would be great to see multi-agent style setups in coax. I haven't thought much about it, to be honest.

The simplest setup would be to use separate policies and either update the policies individually or write your own policy objective that updates multiple policies at the same time.

Having said that, I'm not an expert in multi-agent RL myself, so I'm not aware of all the subtleties associated with such a setup.

But of course, I welcome contributions and I'm curious to see what you come up with!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants