Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[nan] Support Min Nan #259

Open
agutkin opened this issue Oct 31, 2020 · 6 comments
Open

[nan] Support Min Nan #259

agutkin opened this issue Oct 31, 2020 · 6 comments
Assignees
Labels
language support Language-specific issues

Comments

@agutkin
Copy link
Contributor

agutkin commented Oct 31, 2020

Supporting Min Nan requires writing custom extractor.

@kylebgorman
Copy link
Collaborator

Assigning this to this Sasha Gutkin character.

@kylebgorman kylebgorman added the language support Language-specific issues label Oct 31, 2020
@lfashby
Copy link
Collaborator

lfashby commented Jan 25, 2021

Based on the nan results from the last scrape (it ultimately had less than 100 entries and thus didn't make it through) we can scrape Min Nan, we just skip most entries because the headwords contain dashes. Since our skip word logic occurs before our pron extraction we probably need to modify skip_word to not skip words with dashes if config.key == 'nan' (or something like that), rather than writing a custom extractor.

@agutkin
Copy link
Contributor Author

agutkin commented Jan 25, 2021

Can someone take this on? I don't have any cycles at the moment.

@kylebgorman
Copy link
Collaborator

Oh that's very simple then, @lfashby can you take a stab at this as part of your larger process?

@lfashby
Copy link
Collaborator

lfashby commented Jan 25, 2021

Sure, I can implement my crude solution and re-scrape. Perhaps at some point in the future we can come back to this if we think of a more elegant way of dealing with it/find other languages that have the same problem.

@jacksonllee
Copy link
Collaborator

Just left this comment about Min Nan. Chinese languages are tough...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
language support Language-specific issues
Projects
None yet
Development

No branches or pull requests

4 participants