Skip to content
This repository has been archived by the owner on Jun 10, 2021. It is now read-only.

Better pre-trained PyTorch models #565

Open
KeithCu opened this issue Mar 22, 2019 · 0 comments
Open

Better pre-trained PyTorch models #565

KeithCu opened this issue Mar 22, 2019 · 0 comments

Comments

@KeithCu
Copy link

KeithCu commented Mar 22, 2019

I've been researching how LibreOffice could talk to online translators and discovered your project. It looks very interesting, but what would be nice is a pre-trained PyTorch model with many languages supported. Currently the PyTorch release only supports German to English and English to German.

Inference can run fast enough on the CPU, which is what LibreOffice users typically have. Google and Microsoft, for example, offer support for many languages, but it's an online service which has freedom, performance, privacy, and potential cost concerns.

I heard Google used EU regulations and UN speeches. It shouldn't be too hard to write a spider to get stuff that would be free or allowed via Fair Use. It should be just a little Python script to grab all Debian translation files. That would be quite a corpus. I could try to write it if you wanted.

I realize this is primarily a research platform, but a pre-trained model would be pretty easy to plug into LibreOffice or the Linux desktop.

Update: Here's a link to all the Debian PO files: https://www.debian.org/international/l10n/po/
From there you can go to each language, and then to the PO file for each package.
A simpler spider could download them all.
It appears that most of the files are in this directory: https://i18n.debian.org/material/po/unstable/main/
A recursive wget should be able to grab them all.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

1 participant