Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support absolute language confidence metric #54

Open
warvyvr opened this issue Dec 21, 2023 · 3 comments
Open

Support absolute language confidence metric #54

warvyvr opened this issue Dec 21, 2023 · 3 comments

Comments

@warvyvr
Copy link

warvyvr commented Dec 21, 2023

Hi,
In my scenario, the goal is to detect whether the input text is in English or another language. I'm not sure how to utilize the library to accomplish this task. For instance, if the input text is in a specified language, such as Vietnamese, I expect the detection as non english

	languages := []lingua.Language{
		lingua.English,
		lingua.Vietnamese,
		lingua.Unknown,
	}

	sentence := "Thông tin tài khoản của bạn"

	detector := lingua.NewLanguageDetectorBuilder().
		FromLanguages(languages...).
		WithMinimumRelativeDistance(0.9).
		Build()

	confidenceValues := detector.ComputeLanguageConfidenceValues(sentence)

	for _, elem := range confidenceValues {
		fmt.Printf("%s: %.2f\n", elem.Language(), elem.Value())
	}

output:

Vietnamese: 1.00
English: 0.00

when remove lingua.Vietnamese from expected language list, the program outputs English: 1.00, I would like the result is other language type rather than engilsh.
please help me on how to do this.
Thanks in advance.

@pemistahl
Copy link
Owner

Hi, what you want is not yet possible with my library. As of now, it only provides a relative confidence metric that tells you how likely a language is in comparison to another language. What you want is an absolute confidence metric that works independently from any other language. I plan to implement something like that but it's not easy. I can't tell you when this will be done.

@pemistahl pemistahl changed the title need help on language detect case Support absolute language confidence metric Dec 21, 2023
@warvyvr
Copy link
Author

warvyvr commented Dec 21, 2023

Hi, what you want is not yet possible with my library. As of now, it only provides a relative confidence metric that tells you how likely a language is in comparison to another language. What you want is an absolute confidence metric that works independently from any other language. I plan to implement something like that but it's not easy. I can't tell you when this will be done.

Thanks, it is a good news, look forward to it.

@therealaditigupta
Copy link

therealaditigupta commented Apr 24, 2024

Looking forward to this feature! We are looking for something similar. Any update on when this may be available?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants