Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LOCALISATION REQUEST: ISO-639-2/3 #4424

Open
1 of 3 tasks
wieling opened this issue Apr 2, 2024 · 12 comments
Open
1 of 3 tasks

LOCALISATION REQUEST: ISO-639-2/3 #4424

wieling opened this issue Apr 2, 2024 · 12 comments
Assignees
Labels
Localisation New language requests and or issues regarding localisation (l10n)

Comments

@wieling
Copy link

wieling commented Apr 2, 2024

Welcome to the Common Voice Community !

Common Voice aims to make speech technology accessible to everyone by building an open sourced dataset of labelled voice data that is representative of languages, variants and accents spoken across the world. This template helps us to know how your language could participate in the Common Voice Project. There are three sections of this form, once you have filled out a section please click the checkbox. If you have any issues please contact commonvoice@mozilla.org.

Pontoon Set-up

To start a language on Common Voice volunteers localise our platform via Pontoon and create sentence corpus’ of cc0 text.

Language name

Gronings

Language code

gos
(ISO 639-3)

Language size

About 200,000 estimated.

Plural forms
0 stainen
1 stain
2 stainen
3 stainen
4 stainen
5 stainen
10 stainen
20 stainen
100 stainen
1000 stainen
Ik zai 0 stainen op grond
Ik zai 1 stain op grond
Ik zai 10 stainen op grond
Ik zai stainen op grond

Pontoon manager

https://pontoon.mozilla.org/contributors/0ofHVNKn6_yGdRZPr8_-d6dR7qY/

Language Script

latin

Sentence Collection Requirements

On the Common Voice Platform contributors on the platform read out public domain sentences generated through sentence collection. Sentence collection is a crucial part in launching languages on Common Voice. To support the equitable participation of languages of Common Voice we have introduced three new sentence collection requirements bands.

Sentence Requirement Band

  • Band A
  • Band B
  • Band C

Creating Community

  1. [optional to share] Why do you want to take part in Common Voice ? Enable centralized data collection for improving speech recognition performance for Gronings.

  2. [optional to share] Would you like to have a follow up conversation regarding community building ? No.

@ftyers
Copy link
Collaborator

ftyers commented Apr 2, 2024

Hi @wieling thanks for your interest in Common Voice! Could you let me know what the ISO-639-2/3 language code is for Gronings and what the population size is (you can edit your post). I suppose the script is Latin (given the translations). Any questions feel free to get in contact with us on Matrix: https://app.element.io/#/room/#common-voice:mozilla.org

Edit: You read my mind :)

@wieling
Copy link
Author

wieling commented Apr 2, 2024

Sorry, I apparently submitted while still editing the post... Now all information should be present. We have a large corpus of Groningen texts, so I should be able to create the necessary sentences for reading. Can you let me know at what stage this is necessary? This is not entirely clear to me.

@wieling
Copy link
Author

wieling commented Apr 2, 2024

I'm also not entirely clear about which checkboxes I need to check?

@ftyers ftyers added the Localisation New language requests and or issues regarding localisation (l10n) label Apr 2, 2024
@ftyers
Copy link
Collaborator

ftyers commented Apr 2, 2024

Probably Band B would suit Gronings best. In terms of the sentences, you can add them either via the bulk upload process or via the interface on the main site, once you have reached 70% translation.

Common Voice has been enabled for Gronings for translation, you can access it here. Please make a couple of suggestions and I will add you as a translator.

@wieling
Copy link
Author

wieling commented Apr 2, 2024

Is it necessary to provide a translation of the interface in Gronings? All speakers of Gronings can use Dutch, so if possible, we'd prefer using that language for the interface itself. Is that possible?

@ftyers
Copy link
Collaborator

ftyers commented Apr 2, 2024

Unfortunately no, the current policy of Common Voice is to have the interface localised in the language which will be used for voice collection.

@wieling
Copy link
Author

wieling commented Apr 2, 2024

OK, clear. Can we also get access to the Dutch translations (and how?)? As that will be easier for those doing the translations.

@ftyers
Copy link
Collaborator

ftyers commented Apr 2, 2024

@wieling Yes, each person can change their preferred language in their Pontoon profile.

imatge

See "Default locale -> Preferred source locale".

@wieling
Copy link
Author

wieling commented Apr 2, 2024

Thanks! One remaining question: is it also possible to batch update all translations (and download them in the source language)?

@ftyers
Copy link
Collaborator

ftyers commented Apr 2, 2024

As far as I am aware no, but you could download the .ftl files for Dutch, process them and then manually type them in. If the issue is that speakers/writers are not comfortable with web interfaces, but there are non-speakers who are, then that could be a solution.

@ftyers
Copy link
Collaborator

ftyers commented Apr 9, 2024

@wieling were you able to make a translation suggestion?

@wieling
Copy link
Author

wieling commented Apr 9, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Localisation New language requests and or issues regarding localisation (l10n)
Projects
None yet
Development

No branches or pull requests

2 participants