Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is it possible to retrain on mistakes? #44

Open
Kreijstal opened this issue Dec 10, 2021 · 13 comments
Open

is it possible to retrain on mistakes? #44

Kreijstal opened this issue Dec 10, 2021 · 13 comments
Labels
enhancement New feature or request

Comments

@Kreijstal
Copy link

Given a false sub, would it be possible to give a correct sub, and retrain on it?

@abhirooptalasila
Copy link
Owner

That sounds like a great idea. Although, it'll be really difficult because we can't ensure that the audio is split correctly, and time offsets have to be perfect. And it's not practical to fine-tune on a single sample. Do you have any approaches in mind?

@Kreijstal
Copy link
Author

Hmm maybe increase accessibility about how is the audio split? Also, how does the audio splitting work, is it also an AI?

@abhirooptalasila
Copy link
Owner

I segment on the silent parts of the audio by adapting some code from this project. It's not an AI. We can fine-tune the params while splitting, but it's not a one-size-fits-all solution.

@xiaomao2013
Copy link

I am very happy to see your work. It really took a lot of effort.
I don't know if you have any knowledge of NVIDIA NeMo. I found that NeMo's recognition efficiency is very high. I look forward to your time to make a version that uses NeMo as the recognition core. ^_^

@abhirooptalasila
Copy link
Owner

Hi
I will check it out. Do you know if the model outputs timing information for the detected speech segments?
Because that's how I build the subtitle files.
Do you know which performs better: HuggingFace Wav2Vec or NeMo?

@xiaomao2013
Copy link

xiaomao2013 commented Jan 17, 2022

Hi
I will check it out. Do you know if the model outputs timing information for the detected speech segments?
Because that's how I build the subtitle files.
Do you know which performs better: HuggingFace Wav2Vec or NeMo?

In google you can test about the model outputs timing information for the detected speech segments
please use BRANCH = 'v1.0.2' to test
Since I really don't know HuggingFace Wav2Vec, don't know which is better
But in the NeMo example, I saw that the individual phonetic words are easily separated, and the code is also there. The specific file location is
NeMo/examples/asr/
NeMo/tutorials/asr/01_ASR_with_NeMo.ipynb
Offline_ASR.ipynb

@xiaomao2013
Copy link

Hi
I will check it out. Do you know if the model outputs timing information for the detected speech segments?
Because that's how I build the subtitle files.
Do you know which performs better: HuggingFace Wav2Vec or NeMo?

In google you can test about the model outputs timing information for the detected speech segments please use BRANCH = 'v1.0.2' to test Since I really don't know HuggingFace Wav2Vec, don't know which is better But in the NeMo example, I saw that the individual phonetic words are easily separated, and the code is also there. The specific file location is NeMo/examples/asr/ NeMo/tutorials/asr/01_ASR_with_NeMo.ipynb Offline_ASR.ipynb

I want to try to process video files with a long duration, but it seems that the program can only process wav files within 15 seconds. I also want to translate it into several other languages. The same problem is also limited. I look forward to your presentation. open source programs for these functions

Because the voice translation service often translates some content incorrectly, or deliberately translates it incorrectly, which leads to deviations in the cognition of many people. This is a dark moment. I hope more people can get the truth.

It is really helpless to use offline speech recognition programs and translation programs. In the face of deliberate misleading and harm, we can only use software and platforms that are out of their control, Now I use Gettr

@xiaomao2013
Copy link

Thank you very much for providing an open source software that can fully implement from video and audio files to subtitle files.

I installed and used it, but I don't know which language should be recognized. I use English video files, but the effect seems to be bad. Can you tell me? Thank you so much

@xiaomao2013
Copy link

Thank you very much for providing an open source software that can fully implement from video and audio files to subtitle files.

I installed and used it, but I don't know which language should be recognized. I use English video files, but the effect seems to be bad. Can you tell me? Thank you so much

Or can you tell me how to change the translation module to adapt to other countries' languages, thanks a lot

@abhirooptalasila abhirooptalasila added the enhancement New feature or request label Jan 20, 2022
@abhirooptalasila
Copy link
Owner

I plan on implementing either one of Wav2Vec or NeMo, but will need some time.
DeepSpeech has official models for American English, and there are some community-made models here. If you find a model file you want to use, download the .pbmm and .scorer files and give them as input to AutoSub.

Also, AutoSub can process large video files too. It automatically segments the audio into smaller chunks.

@xiaomao2013
Copy link

Thank you very much for your guidance and hope to see the new program written by you soon.

I plan on implementing either one of Wav2Vec or NeMo, but will need some time. DeepSpeech has official models for American English, and there are some community-made models here. If you find a model file you want to use, download the .pbmm and .scorer files and give them as input to AutoSub.

Also, AutoSub can process large video files too. It automatically segments the audio into smaller chunks.

Thank you very much for your guidance and hope to see the new program written by you soon.

@xiaomao2013
Copy link

I plan on implementing either one of Wav2Vec or NeMo, but will need some time. DeepSpeech has official models for American English, and there are some community-made models here. If you find a model file you want to use, download the .pbmm and .scorer files and give them as input to AutoSub.

Also, AutoSub can process large video files too. It automatically segments the audio into smaller chunks.

According to your prompt, I downloaded the corresponding module
IN
I find
deepspeech-0.9.3-models-zh-CN.pbmm
deepspeech-0.9.3-models-zh-CN.scorer

I download it and I test to run ,But the following error occurs

If it's convenient, please test it out to see how you can get the module to work

Thank you so much


(sub) (base) gettr@gettr:~/AutoSub$ python3 autosub/main.py --model deepspeech-0.9.3-models-zh-CN.pbmm --scorer deepspeech-0.9.3-models-zh-CN.scorer --file ~/3-720.mp4
ARGS: Namespace(dry_run=False, file='/home/gettr/3-720.mp4', format=['srt', 'vtt', 'txt'], model='deepspeech-0.9.3-models-zh-CN.pbmm', scorer='deepspeech-0.9.3-models-zh-CN.scorer', split_duration=5)
Model: /home/gettr/AutoSub/deepspeech-0.9.3-models-zh-CN.pbmm
Scorer: /home/gettr/AutoSub/deepspeech-0.9.3-models-zh-CN.scorer
Input file: /home/gettr/3-720.mp4
Extracted audio to audio/3-720.wav
Splitting on silent parts in audio file

Running inference:
TensorFlow: v2.3.0-6-g23ad988
DeepSpeech: v0.9.3-0-gf2e9c85
0%| | 0/17 [00:07<?, ?it/s]
Traceback (most recent call last):
File "autosub/main.py", line 165, in
main()
File "autosub/main.py", line 156, in main
ds_process_audio(ds, audio_segment_path, output_file_handle_dict, split_duration=args.split_duration)
File "autosub/main.py", line 66, in ds_process_audio
write_to_file(output_file_handle_dict, split_inferred_text, line_count, split_limits, cues)
File "/home/gettr/AutoSub/autosub/writeToFile.py", line 43, in write_to_file
file_handle.write(inferred_text + "\n\n")
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-88: surrogates not allowed

@xiaomao2013
Copy link

I plan on implementing either one of Wav2Vec or NeMo, but will need some time. DeepSpeech has official models for American English, and there are some community-made models here. If you find a model file you want to use, download the .pbmm and .scorer files and give them as input to AutoSub.
Also, AutoSub can process large video files too. It automatically segments the audio into smaller chunks.

According to your prompt, I downloaded the corresponding module IN I find deepspeech-0.9.3-models-zh-CN.pbmm deepspeech-0.9.3-models-zh-CN.scorer

I download it and I test to run ,But the following error occurs

If it's convenient, please test it out to see how you can get the module to work

Thank you so much

(sub) (base) gettr@gettr:~/AutoSub$ python3 autosub/main.py --model deepspeech-0.9.3-models-zh-CN.pbmm --scorer deepspeech-0.9.3-models-zh-CN.scorer --file ~/3-720.mp4 ARGS: Namespace(dry_run=False, file='/home/gettr/3-720.mp4', format=['srt', 'vtt', 'txt'], model='deepspeech-0.9.3-models-zh-CN.pbmm', scorer='deepspeech-0.9.3-models-zh-CN.scorer', split_duration=5) Model: /home/gettr/AutoSub/deepspeech-0.9.3-models-zh-CN.pbmm Scorer: /home/gettr/AutoSub/deepspeech-0.9.3-models-zh-CN.scorer Input file: /home/gettr/3-720.mp4 Extracted audio to audio/3-720.wav Splitting on silent parts in audio file

Running inference: TensorFlow: v2.3.0-6-g23ad988 DeepSpeech: v0.9.3-0-gf2e9c85 0%| | 0/17 [00:07<?, ?it/s] Traceback (most recent call last): File "autosub/main.py", line 165, in main() File "autosub/main.py", line 156, in main ds_process_audio(ds, audio_segment_path, output_file_handle_dict, split_duration=args.split_duration) File "autosub/main.py", line 66, in ds_process_audio write_to_file(output_file_handle_dict, split_inferred_text, line_count, split_limits, cues) File "/home/gettr/AutoSub/autosub/writeToFile.py", line 43, in write_to_file file_handle.write(inferred_text + "\n\n") UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-88: surrogates not allowed

————————————————————————————————————————————
I found an instruction, but I don't know how to do it
LINK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants