Skip to content

Latest commit

 

History

History
73 lines (70 loc) · 19.8 KB

zero.md

File metadata and controls

73 lines (70 loc) · 19.8 KB

Tatoeba Challenge Data - v2023-09-26

This is the "zero" sub-set of the Tatoeba data. Download the data files from the link in the table below. There is a total of

  • 63 language pairs in this sub-set
lang-pair test dev train
Ainu (Japan) - Finnish ain-fin 307
Kotava - Spanish avk-spa 275
German - Northern Frisian deu-frr 278
German - Gronings deu-gos 209 6
German - Hunsrik deu-hrx 482
German - Ladino deu-lad 276 28
German - Lingua Franca Nova deu-lfn 582 54
German - Swabian deu-swg 1523 184
German - Toki Pona deu-tok 10488 13817
Kadazan Dusun - English dtp-eng 1927 1300
Kadazan Dusun - Japanese dtp-jpn 301
Kadazan Dusun - Malay (macrolanguage) dtp-msa 884 7
Kadazan Dusun - Chinese dtp-zho 250
Emilian - Italian egl-ita 202
English - Gronings eng-gos 1154 97
English - Ho eng-hoc 685
English - Hunsrik eng-hrx 235
English - Khasi eng-kha 1314 108
English - Tase Naga eng-nst 805
English - Old Russian eng-orv 322
English - Ottoman Turkish (1500-1928) eng-ota 696 12
English - Prussian eng-prg 219
English - Toki Pona eng-tok 7715 8702
English - Talossan eng-tzl 206 20
Esperanto - Toki Pona epo-tok 2733 1623
Esperanto - Volapük epo-vol 885 37
Finnish - Kven Finnish fin-fkv 506 68
French - Guadeloupean Creole French fra-gcf 1164
French - Lingua Franca Nova fra-lfn 661 121
French - Toki Pona fra-tok 655 26
Guadeloupean Creole French - Guadeloupean Creole French gcf-gcf 233
Gronings - Dutch gos-nld 1852 435
Hebrew - Ladino heb-lad 266 41
Hebrew - Lingua Franca Nova heb-lfn 342 72
Ho - Santali hoc-sat 208
Ido - Lingua Franca Nova ido-lfn 431 55
Interlingua (International Auxiliary Language Association) - Ladino ina-lad 433 46
Interlingua (International Auxiliary Language Association) - Lingua Franca Nova ina-lfn 997 238
Italian - Toki Pona ita-tok 215 16
Javanese - Javanese jav-jav 299
Japanese - Toki Pona jpn-tok 368 2
Kabyle - Kabyle kab-kab 993 1329
Ladino - Latin lad-lat 259 33
Ladino - Lingua Franca Nova lad-lfn 452 68
Ladino - Yiddish lad-yid 803 84
Latin - Latin lat-lat 251
Latin - Lingua Franca Nova lat-lfn 447 71
Latin - Toki Pona lat-tok 253 8
Lingua Franca Nova - Portuguese lfn-por 1945 1226
Lingua Franca Nova - Russian lfn-rus 274 26
Lingua Franca Nova - Klingon lfn-tlh 253 44
Lingua Franca Nova - Turkish lfn-tur 209 23
Lingua Franca Nova - Yiddish lfn-yid 993 753
Dutch - Toki Pona nld-tok 763 5
Old Russian - Ukrainian orv-ukr 973
Ottoman Turkish (1500-1928) - Turkish ota-tur 372 24
Polish - Toki Pona pol-tok 214
Portuguese - Toki Pona por-tok 1719 1023
Russian - Toki Pona rus-tok 994 20
Spanish - Toki Pona spa-tok 870 31
Toki Pona - Toki Pona tok-tok 234
Toki Pona - Yiddish tok-yid 358 17
Yiddish - Yiddish yid-yid 376