Fix vague language codes caused wrong recognition result #136

BingLingGroup · 2019-02-14T14:29:19Z

We know that autosub use the same language codes to process src_language and dst_language. But it isn't specific enough for the api to judge the language. From speech-to-text/docs and translate/docs we know that speech-to-text api language codes are different from translation api language codes. Even the Simplified Chinese version of the docs differs from the English version. (That's totally troublesome)

Simplified Chinese version docs screenshot

English version docs screenshot

You can see the difference of the Chinese language codes between these two docs. And this really matters in some cases.

By the way, although autosub still use the old version of google api to handle the api processing jobs, Google has changed old docs into the new ones. And after my test which I will talk about it later in this passage, at least some of them worked better than the codes before.

Learn more about using the Google Cloud Translation API by reading the documentation.

In this case, Google won't tell you your language codes are vague and refuse to recognize your speech but it will recognize it using the localized version of the language. For example, in accent version of Chinese we have Cantonese which Hong Kong people use it and Mandarin which is the official language of mainland China. When someone used arguments of -S zh-CN -D zh-CN or -S zh -D zh(I modify the constant.py and test it) like the ones on the English docs to recognize the Mandarin Chinese in Hong Kong IP, he will get something recognized mistakenly by Cantonese. People also mentioned in this #112 (Although in Chinese).

So I modified the constant.py and the __init__.py to use the new version of lang codes. I didn't test the translation api but I think it's usable since the docs talk about the usage above. I also fix the logic bug when -S is given and -D is not given. I hope you can read it and much appreciation for your work on autosub.

Below is the test:

~~Sorry to offend you but I screenshot the bug mentioned in #87~~

~~Hong Kong IP confirm~~

Recognize the Chinese Mandarin Clip

And we get something totally wrong

If I used the zh-TW lang code

zh-TW is the Taiwan version of Mandarin at least orally they are almost the same.

Same wrong result as the zh-CN one

What about the en recognition in Hong Kong?

I change the audio into another English one to eliminate the concern about whether Hong Kong is a bad place for Google to do the speech-to-text recognition.

It works just fine. At least it matched the lang code.

Now switched back to my modified code which can use the new version of lang codes.

Now is the test_v3. At least the api accept it.

Finally it recognized and gave the probably correct result

BingLingGroup added 5 commits February 14, 2019 16:26

Add cloud speech-to-text and translation language code to constants.py

2432a8d

Add cloud speech-to-text and translation language codes to __init__.py

1290ce6

Fix pylint code format issues

41303bb

Reverse ffmpeg dependency check issue

7714456

Add help message to readme.md

c3643a2

BingLingGroup mentioned this pull request Feb 14, 2019

Who can help me 求中文版教程 #112

Open

BingLingGroup mentioned this pull request Feb 24, 2019

Data privacy...? #138

Open

BingLingGroup mentioned this pull request Mar 18, 2019

Does it works without translation api? #122

Closed

BingLingGroup mentioned this pull request Apr 1, 2019

Install AutoSub Step to Step in Windows with Translate subtitle #31

Open

This was referenced Jul 12, 2019

Add ISO 639 .bat #67

Open

Add lang codes support BingLingGroup/autosub#34

Closed

BingLingGroup mentioned this pull request Apr 21, 2020

语言代码设置 zh/zh-cn 字幕文件出来是粤语 BingLingGroup/autosub#112

Closed

BingLingGroup mentioned this pull request Dec 26, 2020

Crash #181

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix vague language codes caused wrong recognition result #136

Fix vague language codes caused wrong recognition result #136

BingLingGroup commented Feb 14, 2019

Fix vague language codes caused wrong recognition result #136

Are you sure you want to change the base?

Fix vague language codes caused wrong recognition result #136

Conversation

BingLingGroup commented Feb 14, 2019