Skip to content
This repository has been archived by the owner on Jan 3, 2024. It is now read-only.

Supporting french as base language #76

Open
cedric-audy opened this issue Oct 30, 2020 · 7 comments
Open

Supporting french as base language #76

cedric-audy opened this issue Oct 30, 2020 · 7 comments

Comments

@cedric-audy
Copy link

cedric-audy commented Oct 30, 2020

Hi everyone,

As part of a school project where I needed a bunch of words in french and their definition (also in french), I forked the project and modified the code to my needs. The code is here : https://github.com/cedric-audy/WiktionaryParser .

Solved I was unable to pull from the repo and use wiktionary as a 'package'. For now, when I need it, I include the whole thing in my project and import WiktionaryParser, which is impractical. However, I had no problem installing the main version using pip. Help would be appreciated on that.

After a bit of tinkering I can now retrieve a definition and etymology (see image), with the help of this code : https://github.com/cedric-audy/french_wiktionary_scraper .

image

I am fairly new to all this (git, forking, python, etc etc), so help would be appreciated in making a version of WiktionaryParser that works with french as base language.

@suyashb95
Copy link
Owner

@cedric-audy the project only supports the English wiktionary as of now since the parts of speech/page structure for each language would be different. Thanks for working on support for French on your fork of the repo! I'll try to integrate other languages in the project after taking a look at that

I was unable to pull from the repo and use wiktionary as a 'package'. For now, when I need it, I include the whole thing in my
project and import WiktionaryParser, which is impractical. However, I had no problem installing the main version using pip. Help > would be appreciated on that.

Could you elaborate on this? Are you unable to use it from source?

@cedric-audy
Copy link
Author

cedric-audy commented Oct 31, 2020

Solved

I was unable to pull from the repo and use wiktionary as a 'package'. For now, when I need it, I include the whole thing in my
project and import WiktionaryParser, which is impractical. However, I had no problem installing the main version using pip. Help > would be appreciated on that.

Could you elaborate on this? Are you unable to use it from source?

@cedric-audy
Copy link
Author

I've made some improvements today, we can now retrieve 'nom commun' (definition), 'étymologie', 'synonymes', 'dérivés' (related words?), 'vocabulaire apparenté par le sens' (sense related vocabulary?), 'hyperonymes' (synonyms, but more generic), 'hyponymes' (more specific synonyms, such as bleu d'auvergne for fromage). I still need to do pronunciations.

I really didnt have to change this many things. Maybe french could be integrated into the source code eventually.

Output exemple (using pprint)
image

@tbm
Copy link

tbm commented Dec 18, 2020

@cedric-audy also see PR #56

@danieldjewell
Copy link

danieldjewell commented Feb 15, 2021

I too am interested in other langauges - as @suyash458 points out, one of the problems is that the actual response metadata from Wiktionary changes based on the language queried. (This is why, of course, @cedric-audy you had to change the definitions -- "etymologies" >> "étymologie", etc.) (Side note: I am very sorry to say that I am a very beginner student of French [my apologies @cedric-audy] - but I am helped that something like 40+% of English vocabulary comes from (Norman) French. That said, I think the proper translation for "determiner" would be déterminant and not "dérivés" -- I'll open an issue over on your repo @cedric-audy with more to keep it separate. EDIT: Can't do that, issues aren't enabled. @cedric-audy I think the proper translation of "parts of speech" into French would be (catégorie_lexicale)[https://fr.m.wiktionary.org/wiki/catégorie_lexicale] - I would double check some of the translations like the previous one I mentioned. )

I wasn't aware that the Wiktionary/Mediawiki APIs actually change the langauge of the metadata -- that really does complicate things...

I wonder if Wiktionary has a language mapping table already built for internationalization - e.g. something that will lookup the language-local equivalents for the API response structure. Will check that out.

@cedric-audy
Copy link
Author

Hi @danieldjewell, thank you for your input, issues were already enabled from what I see, but I added some templates in case it needed that. You are right about the botched translation, I was mainly interested in making it work for a school project. Yes to all of your suggestions, but I dont have time this week for this I am afraid :)

@gozat
Copy link

gozat commented Feb 8, 2022

Please see pull request #92 for a possible adaptation of the code.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants