New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Winning evaluation with tablebases in cursed win #5175
Comments
(Apparently present since sf6, according to discord) |
I translated the code to pencil & paper math, and AFAICS it all checks out. It is simply getting the wrong result from the DTZ probe. Why that is.... IDK. |
I can test later but there are some alternative dtz tablebases called "nr" which seemed to fixed the problem for crafty and pere. It makes more sense that the issue was not in Stockfish in the first place. |
Using dtz_nr tablebases
Using dtz tablebases
|
Mh.. is there some information on how those were generated/fixed/edited? Maybe niklasf knows something? |
As I understand it, dtz tables don't assume a 50 move rule, but dtr does. So neither is wrong, just different assumptions. https://chess.stackexchange.com/questions/28520/what-do-dtm-dtz-dtc-dtr-dtz50-and-dtzr-mean |
@niklasf said on Discord: |
So my understanding of the situation is that sf does not report correct tb results in the affected positions when the user provides the widely available standard 3-6men syzygy files. Only if the tb files were generated with this patch does sf work correctly. Is that correct? If that is so, should we warn the user on TB load if they use outdated tb files (if that is possible, by e.g. probing the position from this issue) ? And what is the situation for the 7men files? |
To clarify: Stockfish works correctly in game play. That is, after the capture that crosses into tablebase territory, moves will be ranked correctly and Stockfish will achieve the best possible outcome. That's what the tablebases are designed for - they are not broken and they don't need to be fixed. The issue occurs only in analysis, when setting up arbitrary positions that do not arise from optimal play following a capture. In that case there are really 7 possible probe results:
Avoiding this, by patching the table generator, produces larger tables for no playing strength gain. Handling the ambiguous results will require more code for no playing strength gain. For the analysis board on Lichess, I did both: 6 piece tables with no rounding, and (because 7 piece tables are too much effort to regenerate) a user interface that correctly displays ambiguous results for 7 piece tables. For Stockfish, I am not sure what to do, if anything. |
It's very hard for me to consider them not broken, moves will be ranked correctly at this point but we stopped caring about optimal play or guiding SF to the correct Win long ago, as SF more or less is expected to find these wins mostly, for me I only care about the eval. |
I believe we should re-open this issue. (At least to highlight to end-users that it exists, and to remind ourselves.) At present stockfish returns incorrect analysis results for some FENs with nonzero half move counters when it is supplied with the default (and probably most widely used) 6men syzygy EGTBs, or with the only available format for 7men syzygy EGTBs. |
I think we can reopen, though niklasf probably answered this quite clearly. |
This is clearly false, as niklasf repeatedly stated this is specifically and exactly aimed at improving compression and reducing overall TB size. Syzygy was specifically designed to be the smallest TB, and this compression feature is a noticeable step towards this goal (to the tune of several percent, or so I've heard). I do concur with re-opening in the short run. Altho bestmove selection is unaffected, it is indeed deeply confusing for users to see a winning evaluation for a drawn position. The best idea I've seen is that we should adjust the evaluation of such ambiguous probes to be less than a proven win. Maybe 100 instead of 200 or something? |
It's impossible to imagine that design wise you need to mess this requirement up while you do everything right. we can produce two/three/four bugs such that the so called Syzygy efficiency is optimized more.
|
What is actually returned after probing ? I.e. a |
If we continue to play the game from the original FEN optimally, we reach |
For even more clarity @niklasf ... You are saying that Stockfish avoids this. This is because Stockfish will rank the root moves using this code, and refuse to play any root move with a rank worse than optimal?
The important part here is the Restated: If a repetition has been made, then Stockfish will only play moves with equal-DTZ in winning positions. If a repetition has not been made, then Stockfish will play any move which wins -before- the 100th, avoiding moves which zero on the 100th ply? I don't explicitly see how this guarantees protection from the stated issue in ALL cases. Can a case exist where all moves appear to have the same DTZ=99 or DTZ=100, and then despite best intentions from above, you end up in the same ambiguous situation? IE all moves fall into the "WIN or CURSED WIN" ambiguous bucket? Ref: 108f0da |
The intended effect is to give Stockfish some freedom, but reliable switch to nearly (i.e. 1 ply may be squandered due to rounding) DTZ-optimal play before it's too late. This works when we approach the threshold
Edit: With regard to Edit 2: I misread the implementation and it does not have the problem that I think I saw. |
Describe the issue
Winning evaluation in a cursed win even when using tablebases
https://lichess.org/analysis/standard/8/8/6k1/3B4/3K4/4N3/8/8_w_-_-_54_106
Expected behavior
An evaluation (ideally at depth 1?) of 0.00
Steps to reproduce
Operating system
All
Stockfish version
master
The text was updated successfully, but these errors were encountered: