is it possible to retrain on mistakes? #44

Kreijstal · 2021-12-10T15:58:48Z

Given a false sub, would it be possible to give a correct sub, and retrain on it?

abhirooptalasila · 2021-12-10T16:14:38Z

That sounds like a great idea. Although, it'll be really difficult because we can't ensure that the audio is split correctly, and time offsets have to be perfect. And it's not practical to fine-tune on a single sample. Do you have any approaches in mind?

Kreijstal · 2021-12-10T18:53:14Z

Hmm maybe increase accessibility about how is the audio split? Also, how does the audio splitting work, is it also an AI?

abhirooptalasila · 2021-12-11T05:11:01Z

I segment on the silent parts of the audio by adapting some code from this project. It's not an AI. We can fine-tune the params while splitting, but it's not a one-size-fits-all solution.

xiaomao2013 · 2022-01-16T12:22:41Z

I am very happy to see your work. It really took a lot of effort.
I don't know if you have any knowledge of NVIDIA NeMo. I found that NeMo's recognition efficiency is very high. I look forward to your time to make a version that uses NeMo as the recognition core. ^_^

abhirooptalasila · 2022-01-16T13:08:55Z

Hi
I will check it out. Do you know if the model outputs timing information for the detected speech segments?
Because that's how I build the subtitle files.
Do you know which performs better: HuggingFace Wav2Vec or NeMo?

xiaomao2013 · 2022-01-17T03:57:27Z

Hi
I will check it out. Do you know if the model outputs timing information for the detected speech segments?
Because that's how I build the subtitle files.
Do you know which performs better: HuggingFace Wav2Vec or NeMo?

In google you can test about the model outputs timing information for the detected speech segments
please use BRANCH = 'v1.0.2' to test
Since I really don't know HuggingFace Wav2Vec, don't know which is better
But in the NeMo example, I saw that the individual phonetic words are easily separated, and the code is also there. The specific file location is
NeMo/examples/asr/
NeMo/tutorials/asr/01_ASR_with_NeMo.ipynb
Offline_ASR.ipynb

xiaomao2013 · 2022-01-17T08:44:27Z

Hi
I will check it out. Do you know if the model outputs timing information for the detected speech segments?
Because that's how I build the subtitle files.
Do you know which performs better: HuggingFace Wav2Vec or NeMo?

In google you can test about the model outputs timing information for the detected speech segments please use BRANCH = 'v1.0.2' to test Since I really don't know HuggingFace Wav2Vec, don't know which is better But in the NeMo example, I saw that the individual phonetic words are easily separated, and the code is also there. The specific file location is NeMo/examples/asr/ NeMo/tutorials/asr/01_ASR_with_NeMo.ipynb Offline_ASR.ipynb

I want to try to process video files with a long duration, but it seems that the program can only process wav files within 15 seconds. I also want to translate it into several other languages. The same problem is also limited. I look forward to your presentation. open source programs for these functions

Because the voice translation service often translates some content incorrectly, or deliberately translates it incorrectly, which leads to deviations in the cognition of many people. This is a dark moment. I hope more people can get the truth.

It is really helpless to use offline speech recognition programs and translation programs. In the face of deliberate misleading and harm, we can only use software and platforms that are out of their control, Now I use Gettr

xiaomao2013 · 2022-01-20T11:29:34Z

Thank you very much for providing an open source software that can fully implement from video and audio files to subtitle files.

I installed and used it, but I don't know which language should be recognized. I use English video files, but the effect seems to be bad. Can you tell me? Thank you so much

xiaomao2013 · 2022-01-20T11:31:13Z

Thank you very much for providing an open source software that can fully implement from video and audio files to subtitle files.

I installed and used it, but I don't know which language should be recognized. I use English video files, but the effect seems to be bad. Can you tell me? Thank you so much

Or can you tell me how to change the translation module to adapt to other countries' languages, thanks a lot

abhirooptalasila · 2022-01-20T14:30:51Z

I plan on implementing either one of Wav2Vec or NeMo, but will need some time.
DeepSpeech has official models for American English, and there are some community-made models here. If you find a model file you want to use, download the .pbmm and .scorer files and give them as input to AutoSub.

Also, AutoSub can process large video files too. It automatically segments the audio into smaller chunks.

xiaomao2013 · 2022-01-22T02:17:56Z

Thank you very much for your guidance and hope to see the new program written by you soon.

I plan on implementing either one of Wav2Vec or NeMo, but will need some time. DeepSpeech has official models for American English, and there are some community-made models here. If you find a model file you want to use, download the .pbmm and .scorer files and give them as input to AutoSub.

Also, AutoSub can process large video files too. It automatically segments the audio into smaller chunks.

Thank you very much for your guidance and hope to see the new program written by you soon.

xiaomao2013 · 2022-01-22T05:24:23Z

I plan on implementing either one of Wav2Vec or NeMo, but will need some time. DeepSpeech has official models for American English, and there are some community-made models here. If you find a model file you want to use, download the .pbmm and .scorer files and give them as input to AutoSub.

Also, AutoSub can process large video files too. It automatically segments the audio into smaller chunks.

According to your prompt, I downloaded the corresponding module
IN
I find
deepspeech-0.9.3-models-zh-CN.pbmm
deepspeech-0.9.3-models-zh-CN.scorer

I download it and I test to run ，But the following error occurs

If it's convenient, please test it out to see how you can get the module to work

Thank you so much

(sub) (base) gettr@gettr:~/AutoSub$ python3 autosub/main.py --model deepspeech-0.9.3-models-zh-CN.pbmm --scorer deepspeech-0.9.3-models-zh-CN.scorer --file ~/3-720.mp4
ARGS: Namespace(dry_run=False, file='/home/gettr/3-720.mp4', format=['srt', 'vtt', 'txt'], model='deepspeech-0.9.3-models-zh-CN.pbmm', scorer='deepspeech-0.9.3-models-zh-CN.scorer', split_duration=5)
Model: /home/gettr/AutoSub/deepspeech-0.9.3-models-zh-CN.pbmm
Scorer: /home/gettr/AutoSub/deepspeech-0.9.3-models-zh-CN.scorer
Input file: /home/gettr/3-720.mp4
Extracted audio to audio/3-720.wav
Splitting on silent parts in audio file

Running inference:
TensorFlow: v2.3.0-6-g23ad988
DeepSpeech: v0.9.3-0-gf2e9c85
0%| | 0/17 [00:07<?, ?it/s]
Traceback (most recent call last):
File "autosub/main.py", line 165, in
main()
File "autosub/main.py", line 156, in main
ds_process_audio(ds, audio_segment_path, output_file_handle_dict, split_duration=args.split_duration)
File "autosub/main.py", line 66, in ds_process_audio
write_to_file(output_file_handle_dict, split_inferred_text, line_count, split_limits, cues)
File "/home/gettr/AutoSub/autosub/writeToFile.py", line 43, in write_to_file
file_handle.write(inferred_text + "\n\n")
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-88: surrogates not allowed

xiaomao2013 · 2022-01-22T05:57:44Z

I plan on implementing either one of Wav2Vec or NeMo, but will need some time. DeepSpeech has official models for American English, and there are some community-made models here. If you find a model file you want to use, download the .pbmm and .scorer files and give them as input to AutoSub.
Also, AutoSub can process large video files too. It automatically segments the audio into smaller chunks.

According to your prompt, I downloaded the corresponding module IN I find deepspeech-0.9.3-models-zh-CN.pbmm deepspeech-0.9.3-models-zh-CN.scorer

I download it and I test to run ，But the following error occurs

If it's convenient, please test it out to see how you can get the module to work

Thank you so much

(sub) (base) gettr@gettr:~/AutoSub$ python3 autosub/main.py --model deepspeech-0.9.3-models-zh-CN.pbmm --scorer deepspeech-0.9.3-models-zh-CN.scorer --file ~/3-720.mp4 ARGS: Namespace(dry_run=False, file='/home/gettr/3-720.mp4', format=['srt', 'vtt', 'txt'], model='deepspeech-0.9.3-models-zh-CN.pbmm', scorer='deepspeech-0.9.3-models-zh-CN.scorer', split_duration=5) Model: /home/gettr/AutoSub/deepspeech-0.9.3-models-zh-CN.pbmm Scorer: /home/gettr/AutoSub/deepspeech-0.9.3-models-zh-CN.scorer Input file: /home/gettr/3-720.mp4 Extracted audio to audio/3-720.wav Splitting on silent parts in audio file

Running inference: TensorFlow: v2.3.0-6-g23ad988 DeepSpeech: v0.9.3-0-gf2e9c85 0%| | 0/17 [00:07<?, ?it/s] Traceback (most recent call last): File "autosub/main.py", line 165, in main() File "autosub/main.py", line 156, in main ds_process_audio(ds, audio_segment_path, output_file_handle_dict, split_duration=args.split_duration) File "autosub/main.py", line 66, in ds_process_audio write_to_file(output_file_handle_dict, split_inferred_text, line_count, split_limits, cues) File "/home/gettr/AutoSub/autosub/writeToFile.py", line 43, in write_to_file file_handle.write(inferred_text + "\n\n") UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-88: surrogates not allowed

————————————————————————————————————————————
I found an instruction, but I don't know how to do it
LINK

abhirooptalasila added the enhancement New feature or request label Jan 20, 2022

abhirooptalasila mentioned this issue Feb 1, 2022

.tflite files support #41

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

is it possible to retrain on mistakes? #44

is it possible to retrain on mistakes? #44

Kreijstal commented Dec 10, 2021

abhirooptalasila commented Dec 10, 2021

Kreijstal commented Dec 10, 2021

abhirooptalasila commented Dec 11, 2021

xiaomao2013 commented Jan 16, 2022

abhirooptalasila commented Jan 16, 2022

xiaomao2013 commented Jan 17, 2022 •

edited

xiaomao2013 commented Jan 17, 2022

xiaomao2013 commented Jan 20, 2022

xiaomao2013 commented Jan 20, 2022

abhirooptalasila commented Jan 20, 2022

xiaomao2013 commented Jan 22, 2022

xiaomao2013 commented Jan 22, 2022

xiaomao2013 commented Jan 22, 2022

is it possible to retrain on mistakes? #44

is it possible to retrain on mistakes? #44

Comments

Kreijstal commented Dec 10, 2021

abhirooptalasila commented Dec 10, 2021

Kreijstal commented Dec 10, 2021

abhirooptalasila commented Dec 11, 2021

xiaomao2013 commented Jan 16, 2022

abhirooptalasila commented Jan 16, 2022

xiaomao2013 commented Jan 17, 2022 • edited

xiaomao2013 commented Jan 17, 2022

xiaomao2013 commented Jan 20, 2022

xiaomao2013 commented Jan 20, 2022

abhirooptalasila commented Jan 20, 2022

xiaomao2013 commented Jan 22, 2022

xiaomao2013 commented Jan 22, 2022

xiaomao2013 commented Jan 22, 2022

xiaomao2013 commented Jan 17, 2022 •

edited