Spades Implementation #1214

i-Madsen · 2024-04-24T20:34:38Z

Hi there, I'm trying to make a new implementation of Spades and am looking for some help along the way as I'm pretty new to RL. Ultimately, I'm hoping to use the framework to train a Spades AI and use it in an iOS/Android app made in Unity (but that's getting a little ahead of myself).

Since the game of Bridge has a similar structure (partnerships, bidding phase, trick taking) and is already in OpenSpiel, I'm using it as a base, but some game concepts are pretty different - particularly how bidding works.

So the first thing I want to make sure I'm doing correctly is dealing with the tensor sizing/representation.

bridge.h starts out with:

inline constexpr int kNumObservationTypes = 4;  // Bid, lead, declare, defend
// Because bids always increase, any individual bid can be made at most once.
// Thus for each bid, we only need to track (a) who bid it (if anyone), (b) who
// doubled it (if anyone), and (c) who redoubled it (if anyone).
// We also report the number of passes before the first bid; we could
// equivalently report which player made the first call.
// This is much more compact than storing the auction call-by-call, which
// requires 318 turns * 38 possible calls per turn = 12084 bits (although
// in practice almost all auctions have fewer than 80 calls).
inline constexpr int kAuctionTensorSize =
    kNumPlayers * (1           // Did this player pass before the opening bid?
                   + kNumBids  // Did this player make each bid?
                   + kNumBids  // Did this player double each bid?
                   + kNumBids  // Did this player redouble each bid?
                   ) +
    kNumCards                                  // Our hand
    + kNumVulnerabilities * kNumPartnerships;  // Vulnerability of each side
inline constexpr int kPublicInfoTensorSize =
    kAuctionTensorSize  // The auction
    - kNumCards         // But not any player's cards
    + kNumPlayers;      // Plus trailing passes
inline constexpr int kMaxAuctionLength =
    kNumBids * (1 + kNumPlayers * 2) + kNumPlayers;

As far as I understand, Bridge bidding has multiple actions you can take and keeps going until all other players pass, culminating in a single contract. Spades, on the other hand, has each player make a single bid and then the bidding phase ends, leaving '4' contracts (partnerships typically work together to meet their combined contract, but a 'Nil' bid (0) is player dependent). Spades also has no concepts of being vulnerable or having a declarer.

So with that in mind, I'm thinking I change it like this:

inline constexpr int kNumObservationTypes = 1;
inline constexpr int kAuctionTensorSize = kNumPlayers * kNumBids
                                          + kNumCards;  // Our hand
inline constexpr int kPublicInfoTensorSize =
    kAuctionTensorSize  // The auction
    - kNumCards;         // But not any player's cards
inline constexpr int kMaxAuctionLength = 4;

Later on, there is a method for getting the play tensor size.

bridge.h

  static int GetPlayTensorSize(int num_tricks) {
    return kNumBidLevels          // What the contract is
           + kNumDenominations    // What trumps are
           + kNumOtherCalls       // Undoubled / doubled / redoubled
           + kNumPlayers          // Who declarer is
           + kNumVulnerabilities  // Vulnerability of the declaring side
           + kNumCards            // Our remaining cards
           + kNumCards            // Dummy's remaining cards
           + num_tricks * kNumPlayers * kNumCards  // Number of played tricks
           + kNumTricks   // Number of tricks we have won
           + kNumTricks;  // Number of tricks they have won
  }

Again, bids in spades are a simple. single bid, one per player, with no concept of modifying calls, a declarer, or vulnerability. Spades will always be trump and there is no 'dummy' hand, so I'm assuming I can just remove half of these and then multiply bids/tricks by kNumPlayers since we'll want to track each player individually.

spades.h

  static int GetPlayTensorSize(int num_tricks) {
    return kNumBids * kNumPlayers                   // What each player's contract is
           + kNumCards                              // Our remaining cards
           + num_tricks * kNumPlayers * kNumCards   // Number of played tricks
           + kNumTricks * kNumPlayers;              // Number of tricks each player has won
  }

Am I misunderstanding anything or does this seem correct?

The text was updated successfully, but these errors were encountered:

lanctot · 2024-04-27T01:51:30Z

Hi @i-Madsen

That sounds about right to me, but I will give a bit of advice: those observation tensor functions are the most difficult ones to implement. Leave them until the end. I suggest getting a working implementation of Spades with those functions left blank or returning bogus values first because that's already a nontrivial task. It's best to discuss those functions only after the basic game logic functionality is implemented.

i-Madsen · 2024-04-29T18:32:26Z

Gotcha, that makes sense. Thanks, I'll stick with this for now and continue on with getting the game logic all working.

i-Madsen · 2024-05-16T22:49:28Z

I've now got the game functionality able to build and run, completing the basic game tests successfully (I'm ignoring/disabling anything to do with double_double and the uncontested_bidding variant). I think I'm about ready to take a stab at implementing the tensor functions, but have a question about the overall approach for this game.

The Bridge implementation runs only a single round as a zero sum game, always starting with the same player. While this makes sense for Bridge, Spades is a little different as both teams can go either positive or negative during a round. Therefore, the overall game state is much more important as a lower-risk bid that results in that partnership winning the overall game is much better than a high-risk bid that results in a higher point difference. (Overall game state also determines if bag penalties are a risk during the round.)

Currently, I've kept the Bridge structure of always starting with player 0 and playing a single round and my thoughts are that I'll just feed in game parameters like this:

/*parameter_specification=*/
                         {
                             // Whether to end the game early if score gets too low
                             {"use_mercy_rule", GameParameter(true)},

                             // If using mercy rule, the threshold of negative points
                             {"mercy_threshold", GameParameter(-350)},

                             // Amount of points needed to win the game
                             {"win_threshold", GameParameter(500)},

                             // Parnership's current scores
                             // (can infer bags from last digit)
                             {"score_partnership_0", GameParameter(0)},
                             {"score_partnership_1", GameParameter(0)},

                             // If true, replace the play phase with a computed
                             // result based on perfect-information play.
                             {"use_double_dummy_result", GameParameter(false)},

                             // Number of played tricks in observation tensor
                             {"num_tricks", GameParameter(2)},
                         }};

The outside training script can track the scores and manage the teams by 'rotating' what positions the players/team scores are in. (Also need to add a big bonus for winning to the returned rewards in the game environment?)

@lanctot Does this seem reasonable? Or do I need to rethink the approach?

lanctot · 2024-05-17T11:24:18Z

Hi @i-Madsen , yes, that sounds very reasonable to me! And using the Bridge functions as a reference is good. You can also check out Hearts, Skat, and Dou Di Zhu games as well for reference. Good luck!

i-Madsen · 2024-05-20T18:21:28Z

Hey @lanctot, I've gotten my current version of the tensor methods passing the basic game test; are there any other tests I should be running on them right now?

Also, any advice on what learning algorithm(s) to try on Spades?

lanctot · 2024-05-22T01:18:30Z

Great! Well, I think some Spades-specific tests would be nice, but I'd say turn it into a PR if you're willing to contribute. Would be great to make it into the v1.5 release coming in the next few weeks if possible (and can announce it in the release notes).

For starting algorithms, I'd say maybe self-play DQN just to get your hands dirty, then NFSP, and R-NaD. See also the other thread on dominoes (#1218). A lot of the advice and pointers would apply to your case.

i-Madsen · 2024-05-22T20:06:24Z

Cool, I cleaned up the code and I think I've got it all in a pull request now.

Note that a few game parameters are unused at the moment, but I've left them in for now (they'd be needed if the management of the overall game state is moved into the Spades code).

Related question to that: if I want to feed in new game parameters during training, is there a built in way to do that or do I need to make a custom method for Spades? So far from the examples I've looked at in the repo, env.reset() will get called, but that just reuses the same parameters that the game was initialized with.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spades Implementation #1214

Spades Implementation #1214

i-Madsen commented Apr 24, 2024 •

edited

lanctot commented Apr 27, 2024 •

edited

i-Madsen commented Apr 29, 2024

i-Madsen commented May 16, 2024

lanctot commented May 17, 2024

i-Madsen commented May 20, 2024

lanctot commented May 22, 2024

i-Madsen commented May 22, 2024 •

edited

Spades Implementation #1214

Spades Implementation #1214

Comments

i-Madsen commented Apr 24, 2024 • edited

lanctot commented Apr 27, 2024 • edited

i-Madsen commented Apr 29, 2024

i-Madsen commented May 16, 2024

lanctot commented May 17, 2024

i-Madsen commented May 20, 2024

lanctot commented May 22, 2024

i-Madsen commented May 22, 2024 • edited

i-Madsen commented Apr 24, 2024 •

edited

lanctot commented Apr 27, 2024 •

edited

i-Madsen commented May 22, 2024 •

edited