Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Language recognition? #600

Open
ZedZipDev opened this issue Feb 7, 2022 · 5 comments
Open

Language recognition? #600

ZedZipDev opened this issue Feb 7, 2022 · 5 comments

Comments

@ZedZipDev
Copy link

Is there anything for language recognition? I.e. input: text , output: what is the text language

@IgnatiusEzeani
Copy link

IgnatiusEzeani commented Feb 10, 2022

Do you mean language identification task?
See if any of these works can be of any help.

@Yuliya-HV
Copy link

You may want to check StanzaNLP language identification:
https://stanfordnlp.github.io/stanza/langid.html

@sebastianruder
Copy link
Owner

Thanks for these pointers. The task is also abbreviated as language ID and is still far from solved (see this COLING 2020 paper for an overview of challenges). As far as I am aware, there is a lack of gold standard multilingual web-domain datasets for this task.

@LifeIsStrange
Copy link
Contributor

https://paperswithcode.com/task/language-identification

@LifeIsStrange
Copy link
Contributor

LifeIsStrange commented Feb 25, 2022

I wonder if this https://paperswithcode.com/paper/a-reproduction-of-apple-s-bi-directional-lstm is the current state of the art.
The performance is not good at all...
It seems to be a LSTM, I guess a transformer like BERT or better: XLnet would reach higher accuracy?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants