Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bitextor usage #260

Open
hieuhoang opened this issue Jan 23, 2024 · 4 comments
Open

Bitextor usage #260

hieuhoang opened this issue Jan 23, 2024 · 4 comments

Comments

@hieuhoang
Copy link
Contributor

Hi guys!

Wondering if you have a list of other projects or corpora that has been created with the Bitextor/Paracrawl software. I can only think of JParacrawl but I suspect there are more.

Flicking through citations, nothing stands out

@lpla
Copy link
Member

lpla commented Jan 23, 2024

Hi, Hieu! It's nice to hear from you.

We created the MaCoCu corpora using Bitextor and additional Bitextor's organization software: https://macocu.eu

Not corpora, but seems like warc2text is being used in to train the models in the HPLT project: https://hplt-project.org

@mbanon
Copy link
Member

mbanon commented Jan 24, 2024

Hi there!
Also used in Europat https://europat.net/, at least in part ( https://aclanthology.org/2022.lrec-1.78/ )

@mespla
Copy link
Contributor

mespla commented Jan 24, 2024

Hi Hieu!

There are a few more corpora created using Bitextor. The corpora created in the GoURMET project (https://opus.nlpl.eu/GoURMET.php) were produced using it. Also, the parallel corpora produced in the AbuMaTran project used Bitextor, even though it was an older version:

@hieuhoang
Copy link
Contributor Author

Thanks guys. Good to know it's still ticking along

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants