Skip to content

mcognetta/lichess-combined-puzzle-game-db

Repository files navigation

Lichess Combined Puzzle-Game Database

The Lichess puzzle database combined with game information.

The complete dataset can be downloaded from here: MEGA link.

Background

This contains every puzzle from the Lichess Puzzle Database joined with their games from the Lichess Game Database. The puzzle data was pulled in September 2022. There are 2,969,948 puzzles in total. The game info was pulled from the Lichess API with every flag enabled.

Format

The database is given as a single bzip2 compressed ndjson file. Each line contains a JSON object with two top-level fields: game and puzzle. The game object contains the entire JSON dump of the game information from the Lichess API call with every flag enabled. The puzzle object contains all of the information from the puzzle database entry, with the field names being taken from the csv headers. That is, you can expect the following fields in the puzzle object (though they are not necessarily all populated): PuzzleId,FEN,Moves,Rating,RatingDeviation,Popularity,NbPlays,Themes,GameUrl,OpeningFamily, and OpeningVariation.

The games and puzzles are joined by the game id (id in the game object). The matching id is extracted from a puzzle's GameUrl field. Note that PuzzleId and id are unrelated.

The compressed database is ~4.4GB and the uncompressed database is ~30GB.

This repo also contains an uncompressed sample of 100 puzzles and a compressed sample of 50,000 puzzles.

Example

{"puzzle":{"Themes":"endgame mate mateIn2 short","OpeningFamily":"","Popularity":"83","NbPlays":"10","PuzzleId":"004X6","FEN":"1r4k1/p4ppp/2Q5/3pq3/8/P6P/2PR1PP1/Rr4K1 w - - 1 26","Moves":"a1b1 b8b1 d2d1 b1d1","Rating":"1176","RatingDeviation":"278","GameUrl":"https://lichess.org/wvPFkjF9#51"},"game":{"analysis":[{"eval":25},{"eval":12},{"eval":28},{"eval":45},{"eval":17},{"eval":13},{"eval":0},{"eval":21},{"eval":13},{"eval":57},{"eval":35},{"eval":21},{"eval":13},{"eval":32},{"eval":38},{"eval":84},{"judgment":{"name":"Inaccuracy","comment":"Inaccuracy. Qe1 was best."},"variation":"Qe1 O-O","eval":-3,"best":"d1e1"},{"judgment":{"name":"Inaccuracy","comment":"Inaccuracy. Bxe2 was best."},"variation":"Bxe2 Qxe2 O-O Be3 Bb6 Rad1 Re8 Bg5 Re6 Bxf6 Rxf6 g3 Qe8 Qc4","eval":77,"best":"g4e2"},{"judgment":{"name":"Inaccuracy","comment":"Inaccuracy. Bg5 was best."},"variation":"Bg5 Bxe2","eval":4,"best":"c1g5"},{"variation":"Qxg4","eval":58,"best":"d7g4"},{"eval":27},{"variation":"O-O Na4","eval":78,"best":"e8h8"},{"variation":"Na4","eval":25,"best":"c3a4"},{"eval":70},{"judgment":{"name":"Inaccuracy","comment":"Inaccuracy. Na4 was best."},"variation":"Na4 Bb6","eval":-2,"best":"c3a4"},{"eval":13},{"judgment":{"name":"Inaccuracy","comment":"Inaccuracy. Bg3 was best."},"variation":"Bg3","eval":-74,"best":"f4g3"},{"judgment":{"name":"Inaccuracy","comment":"Inaccuracy. Rxb2 was best."},"variation":"Rxb2 Na4","eval":28,"best":"b8b2"},{"variation":"Na4 Bb6","eval":-26,"best":"c3a4"},{"eval":0},{"eval":7},{"eval":0},{"eval":-10},{"eval":2},{"eval":2},{"eval":0},{"eval":7},{"eval":5},{"eval":27},{"eval":0},{"eval":-24},{"judgment":{"name":"Blunder","comment":"Blunder. Qb7 was best."},"variation":"Qb7 f4","eval":315,"best":"d7b7"},{"judgment":{"name":"Blunder","comment":"Blunder. e6 was best."},"variation":"e6 fxe6 Bxb8 Rxb8 c4 Rf8 Re1 Nf4 Kh2 h5 Qe3 Rf6 g3 Ng6","eval":-42,"best":"e5e6"},{"judgment":{"name":"Inaccuracy","comment":"Inaccuracy. R8b6 was best."},"variation":"R8b6","eval":41,"best":"b8b6"},{"judgment":{"name":"Inaccuracy","comment":"Inaccuracy. Qxa7 was best."},"variation":"Qxa7 h5","eval":-58,"best":"c5a7"},{"judgment":{"name":"Inaccuracy","comment":"Inaccuracy. R8b7 was best."},"variation":"R8b7 Re1","eval":34,"best":"b8b7"},{"judgment":{"name":"Inaccuracy","comment":"Inaccuracy. Rdd1 was best."},"variation":"Rdd1","eval":-74,"best":"d2d1"},{"eval":-57},{"judgment":{"name":"Blunder","comment":"Blunder. Rad1 was best."},"variation":"Rad1 Qf6","eval":-1407,"best":"a1d1"},{"eval":-1380},{"mate":-2,"judgment":{"name":"Inaccuracy","comment":"Checkmate is now unavoidable. Rd1 was best."},"variation":"Rd1 Qxa1","best":"d2d1"},{"mate":-1},{"mate":-1}],"createdAt":1643987096972,"variant":"standard","status":"mate","pgn":"[Event \"Rated Rapid game\"]\n[Site \"https://lichess.org/wvPFkjF9\"]\n[Date \"2022.02.04\"]\n[White \"Hescardo\"]\n[Black \"beniboy\"]\n[Result \"0-1\"]\n[UTCDate \"2022.02.04\"]\n[UTCTime \"15:04:56\"]\n[WhiteElo \"1880\"]\n[BlackElo \"1773\"]\n[WhiteRatingDiff \"-9\"]\n[BlackRatingDiff \"+19\"]\n[Variant \"Standard\"]\n[TimeControl \"600+0\"]\n[ECO \"C47\"]\n[Opening \"Four Knights Game: Scotch Variation Accepted\"]\n[Termination \"Normal\"]\n[Annotator \"lichess.org\"]\n\n1. e4 { [%eval 0.25] [%clk 0:10:00] } 1... e5 { [%eval 0.12] [%clk 0:10:00] } 2. Nf3 { [%eval 0.28] [%clk 0:10:00] } 2... Nc6 { [%eval 0.45] [%clk 0:09:58] } 3. Nc3 { [%eval 0.17] [%clk 0:10:00] } 3... Nf6 { [%eval 0.13] [%clk 0:09:58] } 4. d4 { [%eval 0.0] [%clk 0:09:58] } 4... exd4 { [%eval 0.21] [%clk 0:09:57] } { C47 Four Knights Game: Scotch Variation Accepted } 5. Nxd4 { [%eval 0.13] [%clk 0:09:58] } 5... Bc5 { [%eval 0.57] [%clk 0:09:55] } 6. Nxc6 { [%eval 0.35] [%clk 0:09:57] } 6... bxc6 { [%eval 0.21] [%clk 0:09:55] } 7. Bd3 { [%eval 0.13] [%clk 0:09:55] } 7... d6 { [%eval 0.32] [%clk 0:09:53] } 8. O-O { [%eval 0.38] [%clk 0:09:54] } 8... Bg4 { [%eval 0.84] [%clk 0:09:51] } 9. Be2?! { (0.84 → -0.03) Inaccuracy. Qe1 was best. } { [%eval -0.03] [%clk 0:09:49] } (9. Qe1 O-O) 9... Qd7?! { (-0.03 → 0.77) Inaccuracy. Bxe2 was best. } { [%eval 0.77] [%clk 0:09:48] } (9... Bxe2 10. Qxe2 O-O 11. Be3 Bb6 12. Rad1 Re8 13. Bg5 Re6 14. Bxf6 Rxf6 15. g3 Qe8 16. Qc4) 10. Bxg4?! { (0.77 → 0.04) Inaccuracy. Bg5 was best. } { [%eval 0.04] [%clk 0:09:44] } (10. Bg5 Bxe2) 10... Nxg4 { [%eval 0.58] [%clk 0:09:45] } 11. Bd2 { [%eval 0.27] [%clk 0:09:33] } 11... Rb8 { [%eval 0.78] [%clk 0:09:33] } 12. h3 { [%eval 0.25] [%clk 0:09:21] } 12... Ne5 { [%eval 0.7] [%clk 0:09:21] } 13. Bf4?! { (0.70 → -0.02) Inaccuracy. Na4 was best. } { [%eval -0.02] [%clk 0:09:18] } (13. Na4 Bb6) 13... Ng6 { [%eval 0.13] [%clk 0:09:18] } 14. Bh2?! { (0.13 → -0.74) Inaccuracy. Bg3 was best. } { [%eval -0.74] [%clk 0:09:09] } (14. Bg3) 14... O-O?! { (-0.74 → 0.28) Inaccuracy. Rxb2 was best. } { [%eval 0.28] [%clk 0:09:15] } (14... Rxb2 15. Na4) 15. e5 { [%eval -0.26] [%clk 0:09:02] } 15... d5 { [%eval 0.0] [%clk 0:09:07] } 16. Na4 { [%eval 0.07] [%clk 0:08:53] } 16... Be7 { [%eval 0.0] [%clk 0:08:52] } 17. Qd4 { [%eval -0.1] [%clk 0:08:21] } 17... c5 { [%eval 0.02] [%clk 0:08:48] } 18. Nxc5 { [%eval 0.02] [%clk 0:08:17] } 18... Bxc5 { [%eval 0.0] [%clk 0:08:40] } 19. Qxc5 { [%eval 0.07] [%clk 0:08:07] } 19... Rxb2 { [%eval 0.05] [%clk 0:08:38] } 20. Rfd1 { [%eval 0.27] [%clk 0:08:00] } 20... c6 { [%eval 0.0] [%clk 0:08:30] } 21. Rd2 { [%eval -0.24] [%clk 0:07:41] } 21... Rfb8?? { (-0.24 → 3.15) Blunder. Qb7 was best. } { [%eval 3.15] [%clk 0:08:19] } (21... Qb7 22. f4) 22. Bg3?? { (3.15 → -0.42) Blunder. e6 was best. } { [%eval -0.42] [%clk 0:07:04] } (22. e6 fxe6 23. Bxb8 Rxb8 24. c4 Rf8 25. Re1 Nf4 26. Kh2 h5 27. Qe3 Rf6 28. g3 Ng6) 22... Qe6?! { (-0.42 → 0.41) Inaccuracy. R8b6 was best. } { [%eval 0.41] [%clk 0:07:45] } (22... R8b6) 23. a3?! { (0.41 → -0.58) Inaccuracy. Qxa7 was best. } { [%eval -0.58] [%clk 0:06:47] } (23. Qxa7 h5) 23... Nxe5?! { (-0.58 → 0.34) Inaccuracy. R8b7 was best. } { [%eval 0.34] [%clk 0:07:42] } (23... R8b7 24. Re1) 24. Bxe5?! { (0.34 → -0.74) Inaccuracy. Rdd1 was best. } { [%eval -0.74] [%clk 0:06:43] } (24. Rdd1) 24... Qxe5 { [%eval -0.57] [%clk 0:07:42] } 25. Qxc6?? { (-0.57 → -14.07) Blunder. Rad1 was best. } { [%eval -14.07] [%clk 0:06:41] } (25. Rad1 Qf6) 25... Rb1+ { [%eval -13.8] [%clk 0:07:41] } 26. Rxb1?! { (-13.80 → Mate in 2) Checkmate is now unavoidable. Rd1 was best. } { [%eval #-2] [%clk 0:06:27] } (26. Rd1 Qxa1) 26... Rxb1+ { [%eval #-1] [%clk 0:07:41] } 27. Rd1 { [%eval #-1] [%clk 0:06:25] } 27... Rxd1# { [%clk 0:07:40] } { Black wins by checkmate. } 0-1\n\n\n","id":"wvPFkjF9","clock":{"increment":0,"totalTime":600,"initial":600},"rated":true,"players":{"white":{"rating":1880,"analysis":{"mistake":0,"acpl":77,"inaccuracy":7,"accuracy":67,"blunder":2},"user":{"name":"Hescardo","id":"hescardo"},"ratingDiff":-9},"black":{"rating":1773,"analysis":{"mistake":0,"acpl":41,"inaccuracy":4,"accuracy":81,"blunder":1},"user":{"name":"beniboy","id":"beniboy"},"ratingDiff":19}},"winner":"black","moves":"e4 e5 Nf3 Nc6 Nc3 Nf6 d4 exd4 Nxd4 Bc5 Nxc6 bxc6 Bd3 d6 O-O Bg4 Be2 Qd7 Bxg4 Nxg4 Bd2 Rb8 h3 Ne5 Bf4 Ng6 Bh2 O-O e5 d5 Na4 Be7 Qd4 c5 Nxc5 Bxc5 Qxc5 Rxb2 Rfd1 c6 Rd2 Rfb8 Bg3 Qe6 a3 Nxe5 Bxe5 Qxe5 Qxc6 Rb1+ Rxb1 Rxb1+ Rd1 Rxd1#","opening":{"name":"Four Knights Game: Scotch Variation Accepted","eco":"C47","ply":8},"perf":"rapid","speed":"rapid","lastMoveAt":1643987473359}}

License

This dataset retains the same Creative Commons Zero v1.0 Universal License as the original Lichess data.

A Note on the Dataset Hosting

This dataset is just too large to host for free on services like Dropbox or Github, but too small to warrant paying for cloud storage (and, I suspect that if this dataset gets real use, it will eventually just be hosted directly by Lichess with regular updates). Github Large File Storage was also a non-starter, as a single clone of the repo would exhaust the entire month’s bandwidth. I did not want to store it on my personal Google Drive, and I do not have means to reliably self host it. MEGA has a free tier with 20GB storage and a user-based bandwidth limit, which is why I chose it. I am open to suggestions regarding better hosting solutions.

About

The Lichess puzzle database combined with game information.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published