Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for German language #99

Open
jwage opened this issue Jun 29, 2018 · 17 comments
Open

Add support for German language #99

jwage opened this issue Jun 29, 2018 · 17 comments

Comments

@jwage
Copy link
Member

jwage commented Jun 29, 2018

cc @dmecke

@jwage jwage added this to the v2.0.0 milestone Jun 29, 2018
@sinogermany
Copy link

That would be great if German is supported. It doesn't seem to be as straightforward though. Example:

  • Singular: Internationalisierungsstrategie
  • Plural: Internationalisierungsstrategien

How can we tell they are Internationalisierung and Strategie (Note there's a connecting s)?

Say we can - so would it become something like:

  • camelCase (singular): internationalisierungsStrategie ???
  • snake_case (singular): internationalisierungs_strategie ???

Never programmed in German, quite interested to see how it works. Let me know if I can be of any help.

@sinogermany
Copy link

sinogermany commented Aug 13, 2018

A few compound word examples here:

  • Internationalisierungsstrategie -> /(.+)ung[s]?(.+)/
  • Zwischenzeit
  • Höchsttemperatur
  • Haustürschlüsselloch

Normally when we program in German how do we deal with umlauts? Use e instead?

@alcaeus
Copy link
Member

alcaeus commented Aug 14, 2018

How can we tell they are Internationalisierung and Strategie (Note there's a connecting s)?

I may be missing something, but I don't think we need to match all words. For regular words, we only care about the ending. For irregular ones, we match from the end and keep building an ever longer and longer list of irregular words as issues appear.

Normally when we program in German how do we deal with umlauts? Use e instead?

When people can't type them (e.g. because of restrictions in what characters are allowed), they normally use these replacements:

  • ä => ae
  • ö => oe
  • ü => ue
  • ß => ss

Some tools replace them with just the first letter of these replacements, which I believe is incorrect.

@jwage
Copy link
Member Author

jwage commented Aug 14, 2018

So who is going to add support for German? :) You can see the rules for another language here https://github.com/doctrine/inflector/tree/master/lib/Doctrine/Inflector/Rules/English

@dereuromark
Copy link

I would close this as impossible.
I once tried as well, and after 500 lines of exceptions and still no where close to having a reliable package this is just not worth doing. Whats the purpose if still every 2nd word doesn't work properly?
Inflection is not relevant here IMO, at least we have it well working for English.

@jwage
Copy link
Member Author

jwage commented Sep 28, 2018

Do you think it is really impossible? or just really hard with many exceptions to the rules? I would be interested still in getting a PR started and maybe we can slowly work on it over time and get contributions from people.

@dereuromark
Copy link

dereuromark commented Sep 28, 2018

Knowing my German language well and looking into the issue it is really literally impossible.
It starts with words, that are the same but also need to know the "der/die/das" word in front of it to clarify the male/female/neutral form and thus the meaning, and as such also different plural forms for those. And that only covers this aspect, plurals on many many forms also require this male/female/neutral pre-word to have any possibility of building rules, and that is impossible if you only have the noun.
The ending itself only is not really possible to use here (like for English). So yeah it is a mess.

@jwage
Copy link
Member Author

jwage commented Sep 28, 2018

And if we had a public API that let you provide the word before, would it still be impossible in other cases?

@dereuromark
Copy link

You could start a library for all nouns. Together with this that would work.

@jwage jwage removed this from the v2.0.0 milestone Jan 9, 2019
@nschoellhorn
Copy link
Contributor

nschoellhorn commented Mar 17, 2019

Is this still relevant/wanted? I think the majority of words could be covered with a few rules. Of course, German is not an easy language so there will be quite some exceptions but they are there in English as well so I wouldn't say it's impossible. I can try to implement some of that, as far as I can get and then we see if it looks good or if there are any bigger hurdles to overcome.

Maybe I am missing something, but it doesn't look impossible to me, and I am also a native speaker.

@jwage
Copy link
Member Author

jwage commented Mar 17, 2019

I would still be interested in seeing support for other languages.

@nschoellhorn
Copy link
Contributor

Ha, I indeed missed something, yes. Without the context of at least one sentence, it is not really doable without building the mentioned list of words. If we have a complete sentence, I guess it would look better, but with the word alone, we will not get very far. Sorry for digging this up needlessly :-/

@dereuromark
Copy link

dereuromark commented Mar 19, 2019

Yeah, without the context (at least the "der/die/das" article) there will be no chance to translate properly.
Imagine "Leiter".

die Leiter ("ladder") => die Leitern ("ladders")
der Leiter ("leader") => die Leiter ("leaders")

And that is only one of a few issues to mention.

@jwage
Copy link
Member Author

jwage commented Mar 19, 2019

The API could be enhanced so that the context around the word can be passed through, no?

@dereuromark
Copy link

Never programmed in German, quite interested to see how it works. Let me know if I can be of any help.

You also should NEVER program in German.^^

I only see the value here in translation tooling and maybe custom routing configs and other things that have context data processing, this should never find its way into actual PHP source code class files IMO.
And especially not as class names, method names, or other tokens.

@nschoellhorn
Copy link
Contributor

The API could be enhanced so that the context around the word can be passed through, no?

Yes, sure. But the problem is: how much of the context do you want to provide? There are cases where even the full sentence around the word might not be enough. I think it is doable if we get provided with the article of the given word. At least for the most cases. Would people want to provide that?

@noud noud mentioned this issue Dec 4, 2020
@grafst
Copy link

grafst commented Aug 14, 2023

It is funny that this is almost trivial in english, but impossible in German.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants