Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
tatoeba.afr-eng.afr		tatoeba.afr-eng.afr
tatoeba.afr-eng.eng		tatoeba.afr-eng.eng
tatoeba.amh-eng.amh		tatoeba.amh-eng.amh
tatoeba.amh-eng.eng		tatoeba.amh-eng.eng
tatoeba.ang-eng.ang		tatoeba.ang-eng.ang
tatoeba.ang-eng.eng		tatoeba.ang-eng.eng
tatoeba.ara-eng.ara		tatoeba.ara-eng.ara
tatoeba.ara-eng.eng		tatoeba.ara-eng.eng
tatoeba.arq-eng.arq		tatoeba.arq-eng.arq
tatoeba.arq-eng.eng		tatoeba.arq-eng.eng
tatoeba.arz-eng.arz		tatoeba.arz-eng.arz
tatoeba.arz-eng.eng		tatoeba.arz-eng.eng
tatoeba.ast-eng.ast		tatoeba.ast-eng.ast
tatoeba.ast-eng.eng		tatoeba.ast-eng.eng
tatoeba.awa-eng.awa		tatoeba.awa-eng.awa
tatoeba.awa-eng.eng		tatoeba.awa-eng.eng
tatoeba.aze-eng.aze		tatoeba.aze-eng.aze
tatoeba.aze-eng.eng		tatoeba.aze-eng.eng
tatoeba.bel-eng.bel		tatoeba.bel-eng.bel
tatoeba.bel-eng.eng		tatoeba.bel-eng.eng
tatoeba.ben-eng.ben		tatoeba.ben-eng.ben
tatoeba.ben-eng.eng		tatoeba.ben-eng.eng
tatoeba.ber-eng.ber		tatoeba.ber-eng.ber
tatoeba.ber-eng.eng		tatoeba.ber-eng.eng
tatoeba.bos-eng.bos		tatoeba.bos-eng.bos
tatoeba.bos-eng.eng		tatoeba.bos-eng.eng
tatoeba.bre-eng.bre		tatoeba.bre-eng.bre
tatoeba.bre-eng.eng		tatoeba.bre-eng.eng
tatoeba.bul-eng.bul		tatoeba.bul-eng.bul
tatoeba.bul-eng.eng		tatoeba.bul-eng.eng
tatoeba.cat-eng.cat		tatoeba.cat-eng.cat
tatoeba.cat-eng.eng		tatoeba.cat-eng.eng
tatoeba.cbk-eng.cbk		tatoeba.cbk-eng.cbk
tatoeba.cbk-eng.eng		tatoeba.cbk-eng.eng
tatoeba.ceb-eng.ceb		tatoeba.ceb-eng.ceb
tatoeba.ceb-eng.eng		tatoeba.ceb-eng.eng
tatoeba.ces-eng.ces		tatoeba.ces-eng.ces
tatoeba.ces-eng.eng		tatoeba.ces-eng.eng
tatoeba.cha-eng.cha		tatoeba.cha-eng.cha
tatoeba.cha-eng.eng		tatoeba.cha-eng.eng
tatoeba.cmn-eng.cmn		tatoeba.cmn-eng.cmn
tatoeba.cmn-eng.eng		tatoeba.cmn-eng.eng
tatoeba.cor-eng.cor		tatoeba.cor-eng.cor
tatoeba.cor-eng.eng		tatoeba.cor-eng.eng
tatoeba.csb-eng.csb		tatoeba.csb-eng.csb
tatoeba.csb-eng.eng		tatoeba.csb-eng.eng
tatoeba.cym-eng.cym		tatoeba.cym-eng.cym
tatoeba.cym-eng.eng		tatoeba.cym-eng.eng
tatoeba.dan-eng.dan		tatoeba.dan-eng.dan
tatoeba.dan-eng.eng		tatoeba.dan-eng.eng
tatoeba.deu-eng.deu		tatoeba.deu-eng.deu
tatoeba.deu-eng.eng		tatoeba.deu-eng.eng
tatoeba.dsb-eng.dsb		tatoeba.dsb-eng.dsb
tatoeba.dsb-eng.eng		tatoeba.dsb-eng.eng
tatoeba.dtp-eng.dtp		tatoeba.dtp-eng.dtp
tatoeba.dtp-eng.eng		tatoeba.dtp-eng.eng
tatoeba.ell-eng.ell		tatoeba.ell-eng.ell
tatoeba.ell-eng.eng		tatoeba.ell-eng.eng
tatoeba.epo-eng.eng		tatoeba.epo-eng.eng
tatoeba.epo-eng.epo		tatoeba.epo-eng.epo
tatoeba.est-eng.eng		tatoeba.est-eng.eng
tatoeba.est-eng.est		tatoeba.est-eng.est
tatoeba.eus-eng.eng		tatoeba.eus-eng.eng
tatoeba.eus-eng.eus		tatoeba.eus-eng.eus
tatoeba.fao-eng.eng		tatoeba.fao-eng.eng
tatoeba.fao-eng.fao		tatoeba.fao-eng.fao
tatoeba.fin-eng.eng		tatoeba.fin-eng.eng
tatoeba.fin-eng.fin		tatoeba.fin-eng.fin
tatoeba.fra-eng.eng		tatoeba.fra-eng.eng
tatoeba.fra-eng.fra		tatoeba.fra-eng.fra
tatoeba.fry-eng.eng		tatoeba.fry-eng.eng
tatoeba.fry-eng.fry		tatoeba.fry-eng.fry
tatoeba.gla-eng.eng		tatoeba.gla-eng.eng
tatoeba.gla-eng.gla		tatoeba.gla-eng.gla
tatoeba.gle-eng.eng		tatoeba.gle-eng.eng
tatoeba.gle-eng.gle		tatoeba.gle-eng.gle
tatoeba.glg-eng.eng		tatoeba.glg-eng.eng
tatoeba.glg-eng.glg		tatoeba.glg-eng.glg
tatoeba.gsw-eng.eng		tatoeba.gsw-eng.eng
tatoeba.gsw-eng.gsw		tatoeba.gsw-eng.gsw
tatoeba.heb-eng.eng		tatoeba.heb-eng.eng
tatoeba.heb-eng.heb		tatoeba.heb-eng.heb
tatoeba.hin-eng.eng		tatoeba.hin-eng.eng
tatoeba.hin-eng.hin		tatoeba.hin-eng.hin
tatoeba.hrv-eng.eng		tatoeba.hrv-eng.eng
tatoeba.hrv-eng.hrv		tatoeba.hrv-eng.hrv
tatoeba.hsb-eng.eng		tatoeba.hsb-eng.eng
tatoeba.hsb-eng.hsb		tatoeba.hsb-eng.hsb
tatoeba.hun-eng.eng		tatoeba.hun-eng.eng
tatoeba.hun-eng.hun		tatoeba.hun-eng.hun
tatoeba.hye-eng.eng		tatoeba.hye-eng.eng
tatoeba.hye-eng.hye		tatoeba.hye-eng.hye
tatoeba.ido-eng.eng		tatoeba.ido-eng.eng
tatoeba.ido-eng.ido		tatoeba.ido-eng.ido
tatoeba.ile-eng.eng		tatoeba.ile-eng.eng
tatoeba.ile-eng.ile		tatoeba.ile-eng.ile
tatoeba.ina-eng.eng		tatoeba.ina-eng.eng
tatoeba.ina-eng.ina		tatoeba.ina-eng.ina
tatoeba.ind-eng.eng		tatoeba.ind-eng.eng

README.md

LASER Language-Agnostic SEntence Representations

LASER is a library to calculate and use multilingual sentence embeddings.

Tatoeba multilingual test set

We provide here the test set for 112 languages as we have used in the paper [1]. This data is extracted from the Tatoeba corpus, dated Saturday 2018/11/17.

For each languages, we have selected 1000 English sentences and their translations, if available. Please check this paper for a description of the languages, their families and scripts as well as baseline results.

Please note that the English sentences are not identical for all language pairs. This means that the results are not directly comparable across languages. In particular, the sentences tend to have less variety for several low-resource languages, e.g. "Tom needed water", "Tom needs water", "Tom is getting water", ....

License

Please see here for the license of the Tatoeba corpus.

References

[1] Mikel Artetxe, Holger Schwenk, Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond, arXiv Dec 26 2018

Files

v1

Directory actions

More options

Directory actions

More options

Latest commit

History

v1

Folders and files

parent directory