RIVA integrated with Chat UI does not transcribe speech to text correctly and completely #39

dineshtripathi30 · 2024-01-30T07:43:33Z

I am trying to use RIVA ASR with frontend as given in example, it fails to transcribe speech to text. Most of the time it fails catch my voice correctly.

svenchilton · 2024-01-30T17:20:05Z

@dineshtripathi30, sorry to hear that you're experiencing this issue. Just to check:

Did you set up the Riva server yourself, or are you submitting requests to a remote host?
If you set up the Riva server yourself, did you make sure that you enabled ASR in your desired language(s) and that you downloaded and deployed the streaming ASR model(s)?
Did the ASR Language dropdown menu in the web UI contain the languages you expected? Did you select your desired language from that menu?
If you ran the web UI at <host-ip>:8090 rather than localhost:8090, did you navigate to chrome://flags (or the equivalent in another browser), find "Insecure origins treated as secure," enter <host-ip>:8090 into the appropriate box, and click on the "Relaunch" button?
Did you grant your browser access to your microphone?
Have you verified that another online service (for example, https://dictation.io/speech) can transcribe your speech accurately?

dineshtripathi30 · 2024-01-30T20:25:23Z

I did setup RIVA server myself and yes, ASR is enabled in English (US) and it's selected in the menu., for 4,5,6 points in above comment, yes i did.

Also regarding point 6. I just tried and transcribe works here.

svenchilton · 2024-01-30T23:26:42Z

@dineshtripathi30, OK, good to know that you've covered your bases, and that your microphone is working properly. Let's check your Riva ASR service. Can you do the following?

Record yourself in a mono-channel .wav file
In a Jupyter Notebook, run the following:

import riva.client

def run_asr_streaming_inference(audio_file, output_file, uri='localhost:50051'):
    with open(audio_file, 'rb') as fh:
        data = fh.read()

    auth = riva.client.Auth(uri=uri)
    client = riva.client.ASRService(auth)
    offline_config = riva.client.RecognitionConfig(
        language_code="en-US", # Change this as appropriate
        max_alternatives=1,
        enable_automatic_punctuation=True,
    )
    
    streaming_config = riva.client.StreamingRecognitionConfig(config=offline_config, interim_results=False)
    
    with riva.client.AudioChunkFileIterator(
        audio_file,
        1600,
        delay_callback=riva.client.sleep_audio_length,
    ) as audio_chunk_iterator:

        riva.client.print_streaming(
            responses=client.streaming_response_generator(
                audio_chunks=audio_chunk_iterator,
                streaming_config=streaming_config,
            ),
            output_file=output_file,
            additional_info='no',
            file_mode='w',
            word_time_offsets=False,
        )
    
    return

audio_file = '<Your audio filename>.wav'
output_file = None
run_asr_streaming_inference(audio_file, output_file)

If your Riva server is working properly, it should print a transcription of your audio file to your screen.

dineshtripathi30 · 2024-01-31T07:43:36Z

I recorded " Tell me about Lenovo SE450 server" in an audio file.

Here is test result from notebook, "The way it transcribe 450 is not correct but still it transcribe rest of it correctly.

Tell me about Lenovo Se 04:50 Server.

But
Here is what when i try from chatbot web UI
Tell me about Lenovo as a 50 server.

svenchilton · 2024-01-31T17:31:13Z

OK, I just tried asking my chatbot web UI, "Tell me about Lenovo SE450 Server." After several renderings of my query as "Tell me about Lenovo S. Four fifty server," I eventually got "Tell me about Lenovo Se Four Fifty Server." I had thought that saying "SE450" more slowly than the rest of the query would improve the transcription, but it appears I was wrong. I think I need to consult with the Riva ASR engineers about this.

So as to better scope out the issue, can you ask the chatbot web UI about other technical products and services and compare the ground truth queries to the generated transcriptions?

anand-nv · 2024-02-02T10:37:01Z

Hi @dineshtripathi30 the issue is with Inverse Text Normalization. You could generate new tokenizer and verbalizer files from https://github.com/NVIDIA/NeMo-text-processing/tree/en_tech and use them in your Riva server build. This should resolve the issue you are having. You can refer to the documentation in https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/text_normalization/wfst/wfst_text_processing_deployment.html.

dineshtripathi30 · 2024-02-05T11:25:50Z

Anand,
In my case, problem is not only with that, but its with other transcription as well.

e.g. I said " What do you know about Lenovo"

and it transcribe
What about Lenovo?

Next I asked " Tell Me About Meta Lama 13 Billion Parameter Model" and it transcribe correctly.

Next I asked " What do you think about Generative AI" and it transcribe "Do you think about generative Ai? "

So, accuracy is the issue in my case.

shubhadeepd added the bug Something isn't working label Jan 30, 2024

shubhadeepd assigned svenchilton Jan 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RIVA integrated with Chat UI does not transcribe speech to text correctly and completely #39

RIVA integrated with Chat UI does not transcribe speech to text correctly and completely #39

dineshtripathi30 commented Jan 30, 2024

svenchilton commented Jan 30, 2024

dineshtripathi30 commented Jan 30, 2024

svenchilton commented Jan 30, 2024

dineshtripathi30 commented Jan 31, 2024

svenchilton commented Jan 31, 2024

anand-nv commented Feb 2, 2024 •

edited

dineshtripathi30 commented Feb 5, 2024

RIVA integrated with Chat UI does not transcribe speech to text correctly and completely #39

RIVA integrated with Chat UI does not transcribe speech to text correctly and completely #39

Comments

dineshtripathi30 commented Jan 30, 2024

svenchilton commented Jan 30, 2024

dineshtripathi30 commented Jan 30, 2024

svenchilton commented Jan 30, 2024

dineshtripathi30 commented Jan 31, 2024

Tell me about Lenovo Se 04:50 Server.

svenchilton commented Jan 31, 2024

anand-nv commented Feb 2, 2024 • edited

dineshtripathi30 commented Feb 5, 2024

anand-nv commented Feb 2, 2024 •

edited