-
Notifications
You must be signed in to change notification settings - Fork 893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spades Implementation #1214
Comments
Hi @i-Madsen That sounds about right to me, but I will give a bit of advice: those observation tensor functions are the most difficult ones to implement. Leave them until the end. I suggest getting a working implementation of Spades with those functions left blank or returning bogus values first because that's already a nontrivial task. It's best to discuss those functions only after the basic game logic functionality is implemented. |
Gotcha, that makes sense. Thanks, I'll stick with this for now and continue on with getting the game logic all working. |
I've now got the game functionality able to build and run, completing the basic game tests successfully (I'm ignoring/disabling anything to do with double_double and the uncontested_bidding variant). I think I'm about ready to take a stab at implementing the tensor functions, but have a question about the overall approach for this game. The Bridge implementation runs only a single round as a zero sum game, always starting with the same player. While this makes sense for Bridge, Spades is a little different as both teams can go either positive or negative during a round. Therefore, the overall game state is much more important as a lower-risk bid that results in that partnership winning the overall game is much better than a high-risk bid that results in a higher point difference. (Overall game state also determines if bag penalties are a risk during the round.) Currently, I've kept the Bridge structure of always starting with player 0 and playing a single round and my thoughts are that I'll just feed in game parameters like this:
The outside training script can track the scores and manage the teams by 'rotating' what positions the players/team scores are in. (Also need to add a big bonus for winning to the returned rewards in the game environment?) @lanctot Does this seem reasonable? Or do I need to rethink the approach? |
Hi @i-Madsen , yes, that sounds very reasonable to me! And using the Bridge functions as a reference is good. You can also check out Hearts, Skat, and Dou Di Zhu games as well for reference. Good luck! |
Hey @lanctot, I've gotten my current version of the tensor methods passing the basic game test; are there any other tests I should be running on them right now? Also, any advice on what learning algorithm(s) to try on Spades? |
Great! Well, I think some Spades-specific tests would be nice, but I'd say turn it into a PR if you're willing to contribute. Would be great to make it into the v1.5 release coming in the next few weeks if possible (and can announce it in the release notes). For starting algorithms, I'd say maybe self-play DQN just to get your hands dirty, then NFSP, and R-NaD. See also the other thread on dominoes (#1218). A lot of the advice and pointers would apply to your case. |
Cool, I cleaned up the code and I think I've got it all in a pull request now. Note that a few game parameters are unused at the moment, but I've left them in for now (they'd be needed if the management of the overall game state is moved into the Spades code). Related question to that: if I want to feed in new game parameters during training, is there a built in way to do that or do I need to make a custom method for Spades? So far from the examples I've looked at in the repo, env.reset() will get called, but that just reuses the same parameters that the game was initialized with. |
Hi there, I'm trying to make a new implementation of Spades and am looking for some help along the way as I'm pretty new to RL. Ultimately, I'm hoping to use the framework to train a Spades AI and use it in an iOS/Android app made in Unity (but that's getting a little ahead of myself).
Since the game of Bridge has a similar structure (partnerships, bidding phase, trick taking) and is already in OpenSpiel, I'm using it as a base, but some game concepts are pretty different - particularly how bidding works.
So the first thing I want to make sure I'm doing correctly is dealing with the tensor sizing/representation.
bridge.h starts out with:
As far as I understand, Bridge bidding has multiple actions you can take and keeps going until all other players pass, culminating in a single contract. Spades, on the other hand, has each player make a single bid and then the bidding phase ends, leaving '4' contracts (partnerships typically work together to meet their combined contract, but a 'Nil' bid (0) is player dependent). Spades also has no concepts of being vulnerable or having a declarer.
So with that in mind, I'm thinking I change it like this:
Later on, there is a method for getting the play tensor size.
bridge.h
Again, bids in spades are a simple. single bid, one per player, with no concept of modifying calls, a declarer, or vulnerability. Spades will always be trump and there is no 'dummy' hand, so I'm assuming I can just remove half of these and then multiply bids/tricks by kNumPlayers since we'll want to track each player individually.
spades.h
Am I misunderstanding anything or does this seem correct?
The text was updated successfully, but these errors were encountered: