Skip to content
This repository has been archived by the owner on Jan 3, 2024. It is now read-only.

Supporting German as base language #55

Open
hjorthjort opened this issue Jan 13, 2020 · 12 comments
Open

Supporting German as base language #55

hjorthjort opened this issue Jan 13, 2020 · 12 comments

Comments

@hjorthjort
Copy link

I want to use this project, but I would like to use German wiktionary. I intend to fork off this project and make the required adaptions. Is there any interest in merging the result back via a PR? It would require some structural changes, but adding more languages later might be easier.

@suyashb95
Copy link
Owner

Sure, a PR to support German would be great! You can fetch results in your local language from the English Wiktionary though

@hjorthjort
Copy link
Author

hjorthjort commented Jan 16, 2020 via email

@suyashb95
Copy link
Owner

Yeah I'd wrongly assumed that the page structures for different wikis would be somewhat similar. Good luck with the PR! Let me know if I can help in any way.

@felixvor
Copy link

felixvor commented May 8, 2020

I'm aware I can get definitions in English of words in other languages. The problem is that the English version of Wiktionary has much fewer German words than the German version, and I also think there is value in using the language your learning FOR learning, ones you reach that level of maturity, which is why I think being able to use different languages versions is nice. What I'm learning is that German Wiktionary structures it's content much differently from English Wiktionary, so I think I will need to reinvent the wheel. Will make a PR when I'm done! Suyash Behera notifications@github.com schrieb am Do., 16. Jan. 2020, 4:18 PM:

Sure, a PR to support German would be great! You can fetch results in your local language from the English Wiktionary though — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#55?email_source=notifications&email_token=ACBGJJ6BLXZ5QHBPG5VOM4DQ6B3GHA5CNFSM4KGH4PQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJENUJA#issuecomment-575199780>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACBGJJYWJ5NKJNI7IE7A4UDQ6B3GHANCNFSM4KGH4PQA .

Any update on this? I would be very interested in that code as well. Would not want to code it if someone else already did :D

@hjorthjort
Copy link
Author

hjorthjort commented May 8, 2020 via email

@suyashb95
Copy link
Owner

@DieseKartoffel I haven't started working on this, feel free to go ahead!

@rroessler1
Copy link

Just noticed this recently... I've been doing some work on a local fork to support other languages, though I'm only interested in definitions.
Basically the original code assumes there's a Table of Contents (TOC) and parses the page data from that. German words don't always have that. So basically I manually create one by checking the nested headers in the page and looking for ones that match the language code and then the part of speech.

rroessler1@ae8fb90

Though I think it'd be cleaner to have a base parsing class and then override certain methods for different languages, but for now I'm taking the lazy approach.

I'm happy to clean it up a bit and submit a PR, but I think it would need a slightly larger design discussion of the best way to support multiple languages going forward, which to my knowledge hasn't happened yet.

@suyashb95
Copy link
Owner

@rroessler1 I'd initially made this project for use in a Telegram dictionary bot(used a different dictionary service instead) and didn't think of supporting other languages. It certainly needs design changes which I think should handle different types of pages instead of specific languages. There could be words in languages other than German that don't have a ToC for example. I was thinking of handling parsing in stages where the first stage tries to figure out the structure of the page from a ToC or by checking the nested headers if a ToC isn't found. For now we could add your changes to this stage and incrementally support more languages. What do you think?

@rroessler1
Copy link

Agreed, I like the idea of handling it in stages and keeping it language-agnostic if possible.

But I think eventually there will have to be language-specific code as well. For example in German the meaning, origins, synonyms are all listed under one header under <p> tags, so I think the parser would just have to search for and extract the text under "Bedeutungen:", which would be a German-specific bit of code. Can't think of any way around this at the moment, unless you pass the responsibility off to the client. (example: essen)

@johnnybigoode-zz
Copy link

Is there any updates on this? Maybe a side branch or something?

@suyashb95
Copy link
Owner

@johnnybigoode I haven't been working on supporting other languages but maybe @rroessler1 has a fork that works?

@rroessler1
Copy link

I have a fork that supports Spanish French and German.

https://github.com/rroessler1/WiktionaryParser

It definitely works, but I haven't looked at it in a year so I'm not sure if it's missing new updates.

I would be happy to try and get it merged into here but definitely don't have time until August at the earliest.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants