Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Winning evaluation with tablebases in cursed win #5175

Open
dav1312 opened this issue Apr 14, 2024 · 18 comments
Open

Winning evaluation with tablebases in cursed win #5175

dav1312 opened this issue Apr 14, 2024 · 18 comments

Comments

@dav1312
Copy link
Contributor

dav1312 commented Apr 14, 2024

Describe the issue

Winning evaluation in a cursed win even when using tablebases

https://lichess.org/analysis/standard/8/8/6k1/3B4/3K4/4N3/8/8_w_-_-_54_106
image

Expected behavior

An evaluation (ideally at depth 1?) of 0.00

Steps to reproduce

Stockfish dev-20240413-c55ae376 by the Stockfish developers (see AUTHORS file)
setoption name SyzygyPath value tb345
info string Found 145 tablebases
position fen 8/8/6k1/3B4/3K4/4N3/8/8 w - - 54 106
go infinite
info string NNUE evaluation using nn-ae6a388e4a1a.nnue
info string NNUE evaluation using nn-baff1ede1f90.nnue
info depth 1 seldepth 2 multipv 1 score cp 20000 nodes 1 nps 333 hashfull 0 tbhits 26 time 3 pv d4e5
info depth 2 seldepth 2 multipv 1 score cp 20000 nodes 2 nps 666 hashfull 0 tbhits 26 time 3 pv d4e5
info depth 3 seldepth 2 multipv 1 score cp 20000 nodes 3 nps 1000 hashfull 0 tbhits 26 time 3 pv d4e5
info depth 4 seldepth 2 multipv 1 score cp 20000 nodes 4 nps 1333 hashfull 0 tbhits 26 time 3 pv d4e5
info depth 5 seldepth 3 multipv 1 score cp 20000 nodes 11 nps 3666 hashfull 0 tbhits 26 time 3 pv d4e5
info depth 6 seldepth 4 multipv 1 score cp 20000 nodes 41 nps 10250 hashfull 0 tbhits 26 time 4 pv d4e5 g6g7 e3g2
info depth 7 seldepth 7 multipv 1 score cp 20000 nodes 194 nps 48500 hashfull 0 tbhits 26 time 4 pv d4e5 g6g7 e3f5 g7f8
info depth 8 seldepth 7 multipv 1 score cp 20000 nodes 442 nps 110500 hashfull 0 tbhits 26 time 4 pv d4e5 g6g7 e3g2 g7f8 d5c4
info depth 9 seldepth 8 multipv 1 score cp 20000 nodes 1119 nps 223800 hashfull 0 tbhits 26 time 5 pv d4e5 g6h6 e5e4 h6g7 e4d3 g7f8
info depth 10 seldepth 9 multipv 1 score cp 20000 nodes 1518 nps 303600 hashfull 0 tbhits 26 time 5 pv d4e5 g6g7 e5f5 g7h8

Operating system

All

Stockfish version

master

@Disservin
Copy link
Member

(Apparently present since sf6, according to discord)

@jhellis3
Copy link
Contributor

I translated the code to pencil & paper math, and AFAICS it all checks out. It is simply getting the wrong result from the DTZ probe. Why that is.... IDK.

@dav1312
Copy link
Contributor Author

dav1312 commented Apr 15, 2024

I can test later but there are some alternative dtz tablebases called "nr" which seemed to fixed the problem for crafty and pere. It makes more sense that the issue was not in Stockfish in the first place.
https://tablebase.lichess.ovh/tables/standard/

@dav1312
Copy link
Contributor Author

dav1312 commented Apr 15, 2024

Using dtz_nr tablebases

Stockfish dev-20240413-c55ae376 by the Stockfish developers (see AUTHORS file)
setoption name SyzygyPath value tb345;tb345_dtz_nr
info string Found 145 tablebases
position fen 8/8/6k1/3B4/3K4/4N3/8/8 w - - 54 106
go depth 10
info string NNUE evaluation using nn-ae6a388e4a1a.nnue
info string NNUE evaluation using nn-baff1ede1f90.nnue
info depth 1 seldepth 2 multipv 1 score cp 25 nodes 1 nps 333 hashfull 0 tbhits 26 time 3 pv d4e5
info depth 2 seldepth 2 multipv 1 score cp 25 nodes 2 nps 666 hashfull 0 tbhits 26 time 3 pv d4e5
info depth 3 seldepth 2 multipv 1 score cp 25 nodes 3 nps 750 hashfull 0 tbhits 26 time 4 pv d4e5
info depth 4 seldepth 2 multipv 1 score cp 25 nodes 4 nps 1000 hashfull 0 tbhits 26 time 4 pv d4e5
info depth 5 seldepth 3 multipv 1 score cp 25 nodes 11 nps 2750 hashfull 0 tbhits 26 time 4 pv d4e5
info depth 6 seldepth 4 multipv 1 score cp 25 nodes 41 nps 10250 hashfull 0 tbhits 26 time 4 pv d4e5 g6g7 e3g2
info depth 7 seldepth 7 multipv 1 score cp 25 nodes 194 nps 38800 hashfull 0 tbhits 26 time 5 pv d4e5 g6g7 e3f5 g7f8
info depth 8 seldepth 7 multipv 1 score cp 25 nodes 442 nps 88400 hashfull 0 tbhits 26 time 5 pv d4e5 g6g7 e3g2 g7f8 d5c4
info depth 9 seldepth 8 multipv 1 score cp 25 nodes 1119 nps 186500 hashfull 0 tbhits 26 time 6 pv d4e5 g6h6 e5e4 h6g7 e4d3 g7f8
info depth 10 seldepth 9 multipv 1 score cp 25 nodes 1518 nps 253000 hashfull 0 tbhits 26 time 6 pv d4e5 g6g7 e5f5 g7h8
bestmove d4e5 ponder g6g7

Using dtz tablebases

setoption name SyzygyPath value tb345;tb345_dtz
info string Found 145 tablebases
position fen 8/8/6k1/3B4/3K4/4N3/8/8 w - - 54 106
go depth 10
info string NNUE evaluation using nn-ae6a388e4a1a.nnue
info string NNUE evaluation using nn-baff1ede1f90.nnue
info depth 1 seldepth 2 multipv 1 score cp 20000 nodes 1 nps 142 hashfull 0 tbhits 26 time 7 pv d4e5
info depth 2 seldepth 2 multipv 1 score cp 20000 nodes 2 nps 250 hashfull 0 tbhits 26 time 8 pv d4e5
info depth 3 seldepth 2 multipv 1 score cp 20000 nodes 3 nps 375 hashfull 0 tbhits 26 time 8 pv d4e5
info depth 4 seldepth 2 multipv 1 score cp 20000 nodes 4 nps 500 hashfull 0 tbhits 26 time 8 pv d4e5
info depth 5 seldepth 3 multipv 1 score cp 20000 nodes 11 nps 1375 hashfull 0 tbhits 26 time 8 pv d4e5
info depth 6 seldepth 4 multipv 1 score cp 20000 nodes 41 nps 4555 hashfull 0 tbhits 26 time 9 pv d4e5 g6g7 e3g2
info depth 7 seldepth 7 multipv 1 score cp 20000 nodes 194 nps 19400 hashfull 0 tbhits 26 time 10 pv d4e5 g6g7 e3f5 g7f8
info depth 8 seldepth 7 multipv 1 score cp 20000 nodes 442 nps 40181 hashfull 0 tbhits 26 time 11 pv d4e5 g6g7 e3g2 g7f8 d5c4
info depth 9 seldepth 8 multipv 1 score cp 20000 nodes 1119 nps 93250 hashfull 0 tbhits 26 time 12 pv d4e5 g6h6 e5e4 h6g7 e4d3 g7f8
info depth 10 seldepth 9 multipv 1 score cp 20000 nodes 1518 nps 126500 hashfull 0 tbhits 26 time 12 pv d4e5 g6g7 e5f5 g7h8
bestmove d4e5 ponder g6g7

@dav1312 dav1312 closed this as completed Apr 15, 2024
@Disservin
Copy link
Member

I can test later but there are some alternative dtz tablebases called "nr" which seemed to fixed the problem for crafty and pere. It makes more sense that the issue was not in Stockfish in the first place. https://tablebase.lichess.ovh/tables/standard/

Mh.. is there some information on how those were generated/fixed/edited? Maybe niklasf knows something?

@whelanh
Copy link

whelanh commented Apr 15, 2024

As I understand it, dtz tables don't assume a 50 move rule, but dtr does. So neither is wrong, just different assumptions. https://chess.stackexchange.com/questions/28520/what-do-dtm-dtz-dtc-dtr-dtz50-and-dtzr-mean

@dav1312
Copy link
Contributor Author

dav1312 commented Apr 15, 2024

I can test later but there are some alternative dtz tablebases called "nr" which seemed to fixed the problem for crafty and pere. It makes more sense that the issue was not in Stockfish in the first place. tablebase.lichess.ovh/tables/standard

Mh.. is there some information on how those were generated/fixed/edited? Maybe niklasf knows something?

@niklasf said on Discord:
some normal dtz tables store rounded values (described on https://syzygy-tables.info/metrics). it's a bit confusing, so i generated no rounding tables, at least up to 6 pieces

@robertnurnberg
Copy link
Contributor

So my understanding of the situation is that sf does not report correct tb results in the affected positions when the user provides the widely available standard 3-6men syzygy files.

Only if the tb files were generated with this patch does sf work correctly.

Is that correct?

If that is so, should we warn the user on TB load if they use outdated tb files (if that is possible, by e.g. probing the position from this issue) ?

And what is the situation for the 7men files?

@niklasf
Copy link
Contributor

niklasf commented Apr 16, 2024

To clarify: Stockfish works correctly in game play. That is, after the capture that crosses into tablebase territory, moves will be ranked correctly and Stockfish will achieve the best possible outcome. That's what the tablebases are designed for - they are not broken and they don't need to be fixed.

The issue occurs only in analysis, when setting up arbitrary positions that do not arise from optimal play following a capture. In that case there are really 7 possible probe results:

  • Loss
  • Loss or blessed loss
  • Blessed loss
  • Draw
  • Cursed win
  • Cursed win or win
  • Win

Avoiding this, by patching the table generator, produces larger tables for no playing strength gain. Handling the ambiguous results will require more code for no playing strength gain.

For the analysis board on Lichess, I did both: 6 piece tables with no rounding, and (because 7 piece tables are too much effort to regenerate) a user interface that correctly displays ambiguous results for 7 piece tables. For Stockfish, I am not sure what to do, if anything.

@peregrineshahin
Copy link
Contributor

It's very hard for me to consider them not broken, moves will be ranked correctly at this point but we stopped caring about optimal play or guiding SF to the correct Win long ago, as SF more or less is expected to find these wins mostly, for me I only care about the eval.
Also, it's pretty easy to notice that this must be a bug/laziness/some oversight turned into a feature..

@robertnurnberg
Copy link
Contributor

I believe we should re-open this issue. (At least to highlight to end-users that it exists, and to remind ourselves.)

At present stockfish returns incorrect analysis results for some FENs with nonzero half move counters when it is supplied with the default (and probably most widely used) 6men syzygy EGTBs, or with the only available format for 7men syzygy EGTBs.

@vondele vondele reopened this Apr 17, 2024
@vondele
Copy link
Member

vondele commented Apr 17, 2024

I think we can reopen, though niklasf probably answered this quite clearly.

@dubslow
Copy link
Contributor

dubslow commented Apr 17, 2024

Also, it's pretty easy to notice that this must be a bug/laziness/some oversight turned into a feature..

This is clearly false, as niklasf repeatedly stated this is specifically and exactly aimed at improving compression and reducing overall TB size. Syzygy was specifically designed to be the smallest TB, and this compression feature is a noticeable step towards this goal (to the tune of several percent, or so I've heard).


I do concur with re-opening in the short run. Altho bestmove selection is unaffected, it is indeed deeply confusing for users to see a winning evaluation for a drawn position.

The best idea I've seen is that we should adjust the evaluation of such ambiguous probes to be less than a proven win. Maybe 100 instead of 200 or something?

@peregrineshahin
Copy link
Contributor

It's impossible to imagine that design wise you need to mess this requirement up while you do everything right. we can produce two/three/four bugs such that the so called Syzygy efficiency is optimized more.

This is clearly false, as niklasf repeatedly stated this is specifically and exactly aimed at improving compression and reducing overall TB size. Syzygy was specifically designed to be the smallest TB, and this compression feature is a noticeable step towards this goal (to the tune of several percent, or so I've heard).

@Disservin
Copy link
Member

What is actually returned after probing ? I.e. a WDLCursedWin or a WDLWin ?

@Torom
Copy link
Contributor

Torom commented Apr 17, 2024

If we continue to play the game from the original FEN optimally, we reach 8/8/8/8/6B1/4N3/5K1k/8 w - - 98 128.
Giving Stockfish this position we get: info depth 245 seldepth 3 multipv 1 score cp 20000 nodes 490 nps 163333 hashfull 0 tbhits 20 time 3 pv e3f1 h2h1.
So we output a two move PV that ends in a 50-move draw, but still output 200.00.

@AndyGrant
Copy link

AndyGrant commented May 16, 2024

To clarify: Stockfish works correctly in game play. That is, after the capture that crosses into tablebase territory, moves will be ranked correctly and Stockfish will achieve the best possible outcome. That's what the tablebases are designed for - they are not broken and they don't need to be fixed.

For even more clarity @niklasf ... You are saying that Stockfish avoids this. This is because Stockfish will rank the root moves using this code, and refuse to play any root move with a rank worse than optimal?

        // Better moves are ranked higher. Certain wins are ranked equally.
        // Losing moves are ranked equally unless a 50-move draw is in sight.
        int r    = dtz > 0 ? (dtz + cnt50 <= 99 && !rep ? MAX_DTZ : MAX_DTZ - (dtz + cnt50))
                 : dtz < 0 ? (-dtz * 2 + cnt50 < 100 ? -MAX_DTZ : -MAX_DTZ + (-dtz + cnt50))
                           : 0;
        m.tbRank = r;

The important part here is the <= 99 condition, which is intentionally not <= 100, in order to avoid the off-by-one rounding issue (at least when delivering the mate?). Also, the !rep condition is present to serve a similar purpose for accidentally letting a WIN become a CURSED WIN?

Restated: If a repetition has been made, then Stockfish will only play moves with equal-DTZ in winning positions. If a repetition has not been made, then Stockfish will play any move which wins -before- the 100th, avoiding moves which zero on the 100th ply?

I don't explicitly see how this guarantees protection from the stated issue in ALL cases. Can a case exist where all moves appear to have the same DTZ=99 or DTZ=100, and then despite best intentions from above, you end up in the same ambiguous situation? IE all moves fall into the "WIN or CURSED WIN" ambiguous bucket?

Ref: 108f0da

@niklasf
Copy link
Contributor

niklasf commented May 25, 2024

@AndyGrant I think you found a bug in Stockfish's ranking.

The intended effect is to give Stockfish some freedom, but reliable switch to nearly (i.e. 1 ply may be squandered due to rounding) DTZ-optimal play before it's too late.

This works when we approach the threshold <= 99 and eventually switch. Note that zeroing or mating on half-move clock 100 is still a win.

But And there are indeed endgames that are so tight that immediate DTZ-optimal play is required and even losing 1 ply to rounding would change the outcome. Rounding is turned off for these endgames, so we've got precise DTZ values like 99 and 100 on our hands. But Stockfish does not distinguish those by rank.


Edit: With regard to rep: After switching to optimal play, we may have to repeat a position from Stockfish's previous failed conversion attempts one more time. That's safe if there's only ever been one repetition, so we better switch to optimal play before allowing a second repetition. But then ranking all moves equally, regardless of DTZ seems wrong.


Edit 2: I misread the implementation and it does not have the problem that I think I saw.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests