You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
How would you implement a minimax q-learner with coax?
Hi there! I love the package and how accessible it is to relative newbies. The tutorials are pretty great and the accompanying videos are very helpful!
I was wondering what the best way to implement a minimax algorithm would be, would you recommend using two policies pi1 and pi2? Or is there something better suited for this?
I'd like to re-implement something like this old blogpost of mine in coax to get a better feel of the library.
Any help would be greatly appreciated :)
The text was updated successfully, but these errors were encountered:
It would be great to see multi-agent style setups in coax. I haven't thought much about it, to be honest.
The simplest setup would be to use separate policies and either update the policies individually or write your own policy objective that updates multiple policies at the same time.
Having said that, I'm not an expert in multi-agent RL myself, so I'm not aware of all the subtleties associated with such a setup.
But of course, I welcome contributions and I'm curious to see what you come up with!
How would you implement a minimax q-learner with coax?
Hi there! I love the package and how accessible it is to relative newbies. The tutorials are pretty great and the accompanying videos are very helpful!
I was wondering what the best way to implement a minimax algorithm would be, would you recommend using two policies pi1 and pi2? Or is there something better suited for this?
I'd like to re-implement something like this old blogpost of mine in coax to get a better feel of the library.
Any help would be greatly appreciated :)
The text was updated successfully, but these errors were encountered: