Better pre-trained PyTorch models #565

KeithCu · 2019-03-22T04:29:02Z

I've been researching how LibreOffice could talk to online translators and discovered your project. It looks very interesting, but what would be nice is a pre-trained PyTorch model with many languages supported. Currently the PyTorch release only supports German to English and English to German.

Inference can run fast enough on the CPU, which is what LibreOffice users typically have. Google and Microsoft, for example, offer support for many languages, but it's an online service which has freedom, performance, privacy, and potential cost concerns.

I heard Google used EU regulations and UN speeches. It shouldn't be too hard to write a spider to get stuff that would be free or allowed via Fair Use. It should be just a little Python script to grab all Debian translation files. That would be quite a corpus. I could try to write it if you wanted.

I realize this is primarily a research platform, but a pre-trained model would be pretty easy to plug into LibreOffice or the Linux desktop.

Update: Here's a link to all the Debian PO files: https://www.debian.org/international/l10n/po/
From there you can go to each language, and then to the PO file for each package.
A simpler spider could download them all.
It appears that most of the files are in this directory: https://i18n.debian.org/material/po/unstable/main/
A recursive wget should be able to grab them all.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better pre-trained PyTorch models #565

Better pre-trained PyTorch models #565

KeithCu commented Mar 22, 2019 •

edited

Better pre-trained PyTorch models #565

Better pre-trained PyTorch models #565

Comments

KeithCu commented Mar 22, 2019 • edited

KeithCu commented Mar 22, 2019 •

edited