Skip to content

Strange matching for Spanish phrase detected as Finnish #11

Answered by pemistahl
aidenwallis asked this question in Q&A
Discussion options

You must be logged in to vote

Hello Aiden, thanks for trying my library and for your question.

Well, it seems that for this specific sentence, the sum of the ngram probabilities for Finnish is greater than the one for Spanish. This is not a bug, this is just mathematics. The word pokemon is certainly crucial here. It is a proper noun, so it's neither Finnish nor Spanish. At best, it's Japanese. It contains ngrams that are not characteristic of Spanish, so it confuses the algorithm, returning Finnish. If you remove this word from your sentence, the detector returns Spanish as the most likely language.

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@aidenwallis
Comment options

Answer selected by aidenwallis
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #10 on April 20, 2022 18:55.