Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internationalisation #202

Open
wooorm opened this issue Sep 27, 2018 · 29 comments
Open

Internationalisation #202

wooorm opened this issue Sep 27, 2018 · 29 comments
Labels
help wanted 🙏 This could use your insight or help

Comments

@wooorm
Copy link
Member

wooorm commented Sep 27, 2018

Subject of the issue

Alex, for now, supports just English, but retext could work with other Latin-script languages.

  • We could support other languages in retext-profanities, as there are several open lists of potential profanities in other languages
  • We could support other languages in retext-equality, but that would require people to hand-translate the phrases

Do you know a language other than English

...and are you able (it’ll take some time) and wiling to help? Create a new issue for your language and let’s start working on this!

Translations

The following table describes the status of translations:

Language profanities cuss retext-profanities retext-equality
English Yes Yes Yes Yes
Arabic (Latin-script) Yes Yes Yes No
Spanish Yes Yes Yes No
French Yes Yes Yes No
Portuguese (Brazilian) Yes Yes Yes No
@CRomano31415
Copy link

I see the issue in retext-profanities. I'll start putting a list in Spanish together 👍🏼

@JekRock
Copy link

JekRock commented Oct 3, 2018

Does it support only Latin languages? I can help with Russian and Ukrainian.

@deatheguard
Copy link

I can help with French and/or English!

@baezor
Copy link

baezor commented Oct 3, 2018

I can help with spanish!. Is someone working on spanish here?

@wooorm
Copy link
Member Author

wooorm commented Oct 3, 2018

@CRomano31415 Please do! Feel free to open a new issue / PR about Spanish

@JekRock Under the hood, mainly Latin-script languages are supported. Cyrillic could potentially work. See parse-latin for the main parts about it.

@deatheguard French would be great. Looks like you found GH-207 already. For English, you can just help with retext-profanities and retext-equality already!

@baezor Please do as well! Let’s check if @CRomano31415 opens an issue so y’all can collaborate :)

@AhmedRedaAmin
Copy link

Can I add support for Arab Latin-Script Slang ? (Used heavily on Social media)
Note : it includes numbers for some Arabic sounds , if that will cause problems for parse-latin

@wooorm
Copy link
Member Author

wooorm commented Oct 4, 2018

@AhmedRedaAmin Yes, feel free to do that, and create a new issue for it!

Note : it includes numbers for some Arabic sounds , if that will cause problems for parse-latin

I’m not sure, I don’t know Arabic, let’s try it out!

@inesbenomar18
Copy link

I can help with Arabic and French as well! :)

@wooorm
Copy link
Member Author

wooorm commented Oct 5, 2018

@inesbenomar18 Awesome! For french, start with GH-207. And feel free to open a new issue about Arabic (note though that I think Arabic-script may not be properly supported)

@AhmedRedaAmin
Copy link

Okay , so I noticed there were 2 approaches to adding new languages , so I 'll take @CRomano31415 's approach , I already started by borrowing her repo template , I hope you don't mind Claudia !

@rampagesang
Copy link

@wooorm Do you know a language other than English
...and are you able (it’ll take some time) and wiling to help? Create a new issue for your language and let’s start working on this!

I can help with Korean!~

@wooorm
Copy link
Member Author

wooorm commented Oct 11, 2018

@rampagesang Awesome! Unfortunately I don’t think languages that use a script other than the Latin-script can work with the current technical setup :(

@Yangeok
Copy link

Yangeok commented Oct 13, 2018

Can i help you to translate language korean and germany?

@wooorm
Copy link
Member Author

wooorm commented Oct 13, 2018

@Yangeok Hey, that’s cool! I don’t think Korean will work (see the comment before yours), but German would definitely work!

@PaoloWeishaupt
Copy link

I can help with italian, spanish, french and german.

@luigicorreia
Copy link

I can help with Portuguese.

@toucedam
Copy link

I can help with spanish!. Is someone working on spanish here?

yes

@GledsonAfonso
Copy link

I can help with Portuguese too (pt-BR).

@waaghree
Copy link

waaghree commented Oct 23, 2018

Do you know a language other than English

...and are you able (it’ll take some time) and wiling to help? Create a new issue for your language and let’s start working on this!

How about Urdu? It's in mostly Arabic script with added characters, however, for a large part people use English characterset to write on social media in something called "Roman Urdu". @AhmedRedaAmin are you working on something similar in Arabic?

@wooorm
Copy link
Member Author

wooorm commented Oct 23, 2018

@waaghree Yep, romanised can work. For inspiration, see the Arabic cuss file added in words/cuss#16!

@AhmedRedaAmin
Copy link

@waaghree yes , I worked on something very similar . You can refer to @wooorm 's link as well as the issue titled Arabic Latin-Script on this repo , feel free to ping me if you want to ask about anything .
Glad to see Arabic script being transliterated for more than one language :D , Good luck mate.

@GledsonAfonso
Copy link

@wooorm @AhmedRedaAmin Except from the words/cuss project, there's another way that we can help with Internationalisation? If yes, how?

@AhmedRedaAmin
Copy link

AhmedRedaAmin commented Oct 25, 2018

@GledsonAfonso So here is how it works , you add the words to the words/cuss project , rate them based on profanity , then they get used in get-alex/alex to detect profane words , that is as far as I know how the Yaml files can Identify the inappropriate words , as for the "insensitive" phrases , they get added to retext-equality by hand directly together with the suggested alternatives , they are different from flat out cusses which obviously don't have alternatives suggested .
This is what I understood atleast .
Short answer : The second best way to help with internationalization is to add your native language insensitive phrases to retext-equality along with their suggested alternatives , Good luck mate ! :D

@GledsonAfonso
Copy link

@AhmedRedaAmin Thank you for the explanation! I think it would be interesting if we had some section in the get-alex/alex project (wiki maybe? With a mention in the README file and all the cool stuff) about this... or this already exists and I'm just making a fool of myself here, haha.

Anyway, thanks again for the heads up. I'll see if I can add some phrases in retext-equality as soon as I get the gist of it. Cheers!

@wooorm
Copy link
Member Author

wooorm commented Oct 26, 2018

@GledsonAfonso We could definitely use a section on internationalisation in the contributing.md file!

@GledsonAfonso
Copy link

@wooorm Great! Do we need to create an issue for that?

@wooorm
Copy link
Member Author

wooorm commented Oct 26, 2018

Sure, you could create a separate issue, or if you’d like to work on it feel free to take a stab at it!

@OtacilioN
Copy link

Hey Folks, I would love to help with Portuguese!

@GledsonAfonso
Copy link

GledsonAfonso commented Oct 26, 2018

@wooorm Okay! I will create an issue for that now and see if I can manage to work on it later, in case no one take it first. Thanks!

@OtacilioN Thanks for creating the issue for the language.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted 🙏 This could use your insight or help
Development

No branches or pull requests