Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Real-time speaker diarization doesn't seem to work for me #2360

Closed
TDulka opened this issue Apr 30, 2024 · 2 comments
Closed

Real-time speaker diarization doesn't seem to work for me #2360

TDulka opened this issue Apr 30, 2024 · 2 comments

Comments

@TDulka
Copy link

TDulka commented Apr 30, 2024

image

When testing in the Speech Studio I am seeing exactly the behavior that I am trying to get, as in the picture. Real time speaking into the microphone gets continually transcribed and chunks of text get real-time diarized.

I tried to follow the quickstart guide on real-time diarization, but if I run it what happens is that I only get the "TRANSCRIBED" logs after the whole audio is processed and I don't get the intermediate results. If I add a listener on .transcribing this successfully logs text as it is processed, but without attributing it to a specific speakerId (speakerId is undefined). See image from my terminal. You can also see that I get all the "Transcribing" events first and the "Transcribed" events only at the end of the whole audio.
image

What I desire to see instead is what was shown in this announcement of real-time diarization where the transcribing events are interleaved with transcribed events once an utterance is identified.

I have searched the documentation and samples here, but seem to be unable to find anything on this. I am not sure if this is a bug or if I am missing some crucial piece of information. I would greatly appreciate any help! (I am using JavaScript)

@TDulka
Copy link
Author

TDulka commented May 10, 2024

Okay, I figured things out. I just needed to do processing in the browser and use ConversationTranscriber with fromDefaultMicrophoneInput(), also used the authorization flow described here: https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/js/browser/README.md#token-exchange-process

Example:

const audioConfig = SpeechSDK.AudioConfig.fromDefaultMicrophoneInput();
const speechConfig = SpeechSDK.SpeechConfig.fromAuthorizationToken(authorizationToken, region);
const transcriber = new SpeechSDK.ConversationTranscriber(speechConfig, audioConfig);
transcriber.transcribing = (sender, transcriptionEventArgs) => console.log("transcribing", transcriptionEventArgs.result)
transcriber.transcribed = (sender, transcriptionEventArgs) => console.log("transcribed", transcriptionEventArgs.result)
transcriber.startTranscribingAsync();

@pankopon
Copy link
Contributor

Closed as resolved based on the latest comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants