Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inquiry for possible performance improvement #162

Open
EddieMataEwy opened this issue Feb 25, 2023 · 5 comments
Open

Inquiry for possible performance improvement #162

EddieMataEwy opened this issue Feb 25, 2023 · 5 comments

Comments

@EddieMataEwy
Copy link

The performance of this repo is already amazing, but I wanted to ask a question.
Have you checked the family of improvements defined in this paper? (https://realworld-sdm.github.io/paper/27.pdf)
It derives existing algorithms like CFR+ or DCFR by computing "instant updates" to the counterfactual value, the regret and the strategy.
I don't know if this would add a lot of complexity to the existing codebase, but it allows, for example, for even faster convergence.
This would make CFR+ converge faster than DCFR without worrying about tuning alpha, beta and gamma.

@bupticybee
Copy link
Owner

No I havn't read the paper, will read it. Sounds promising

@xuzy1975
Copy link

xuzy1975 commented Mar 2, 2023

I don't understand the step(5), where to use the instant counterfactual value updated by σt+1?

@xuzy1975
Copy link

xuzy1975 commented Mar 3, 2023

   //ICFR要在这用新策略更新payoffs
      const vector<float> current_strategy_new = trainable->getcurrentStrategy();
      fill(payoffs.begin(),payoffs.end(),0);
      //收集数据
      for (int action_id = 0; action_id < actions.size(); action_id++) {
          vector<float>& action_utilities = results[action_id];
          if(action_utilities.empty())
              continue;
          for (int hand_id = 0; hand_id < action_utilities.size(); hand_id++) {
                  float strategy_prob = current_strategy_new[hand_id + action_id * node_player_private_cards.size()];
                  payoffs[hand_id] += strategy_prob * (action_utilities)[hand_id];
          }
      }

add to the end of actionUtility() , it indeed improve performance in some public, such as 6h6c6d, 7d7h2h...

@EddieMataEwy
Copy link
Author

I believe you need calculation 5 to proceed with parent node calculations. I don't understand it very well. That is why I opened an issue instead of coding it myself and doing a pull request.

@xuzy1975
Copy link

xuzy1975 commented Mar 5, 2023

It seems to need to recalculate payoff use the new strategy, I tried, in some case like banchmark settings, it convergent faster, but in large scale game ,it works worse, maybe somewhere I misunderstood.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants