Real-time speaker diarization doesn't seem to work for me #2360

TDulka · 2024-04-30T23:11:18Z

When testing in the Speech Studio I am seeing exactly the behavior that I am trying to get, as in the picture. Real time speaking into the microphone gets continually transcribed and chunks of text get real-time diarized.

I tried to follow the quickstart guide on real-time diarization, but if I run it what happens is that I only get the "TRANSCRIBED" logs after the whole audio is processed and I don't get the intermediate results. If I add a listener on .transcribing this successfully logs text as it is processed, but without attributing it to a specific speakerId (speakerId is undefined). See image from my terminal. You can also see that I get all the "Transcribing" events first and the "Transcribed" events only at the end of the whole audio.

What I desire to see instead is what was shown in this announcement of real-time diarization where the transcribing events are interleaved with transcribed events once an utterance is identified.

I have searched the documentation and samples here, but seem to be unable to find anything on this. I am not sure if this is a bug or if I am missing some crucial piece of information. I would greatly appreciate any help! (I am using JavaScript)

The text was updated successfully, but these errors were encountered:

TDulka · 2024-05-10T19:12:38Z

Okay, I figured things out. I just needed to do processing in the browser and use ConversationTranscriber with fromDefaultMicrophoneInput(), also used the authorization flow described here: https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/js/browser/README.md#token-exchange-process

Example:

const audioConfig = SpeechSDK.AudioConfig.fromDefaultMicrophoneInput();
const speechConfig = SpeechSDK.SpeechConfig.fromAuthorizationToken(authorizationToken, region);
const transcriber = new SpeechSDK.ConversationTranscriber(speechConfig, audioConfig);
transcriber.transcribing = (sender, transcriptionEventArgs) => console.log("transcribing", transcriptionEventArgs.result)
transcriber.transcribed = (sender, transcriptionEventArgs) => console.log("transcribed", transcriptionEventArgs.result)
transcriber.startTranscribingAsync();

pankopon · 2024-05-16T01:08:46Z

Closed as resolved based on the latest comments.

pankopon closed this as completed May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Real-time speaker diarization doesn't seem to work for me #2360

Real-time speaker diarization doesn't seem to work for me #2360

TDulka commented Apr 30, 2024 •

edited

TDulka commented May 10, 2024

pankopon commented May 16, 2024

Real-time speaker diarization doesn't seem to work for me #2360

Real-time speaker diarization doesn't seem to work for me #2360

Comments

TDulka commented Apr 30, 2024 • edited

TDulka commented May 10, 2024

pankopon commented May 16, 2024

TDulka commented Apr 30, 2024 •

edited