Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows: Calling SpeechSynthesizer.StopSpeakingAsync() does not stop synthesis #2350

Open
bpasero opened this issue Apr 24, 2024 · 4 comments
Labels
accepted Issue moved to product team backlog. Will be closed when addressed. bug Something isn't working

Comments

@bpasero
Copy link

bpasero commented Apr 24, 2024

Describe the bug

A call to SpeechSynthesizer.StopSpeakingAsync() does not stop synthesis for a very long time, up to 30 seconds. The log file is here: speech.log

This issue was previously reported without action at #1836 and #2264

To Reproduce

We are building a node.js binding for Speech SDK and the C++ sources mimic the samples. The synthesis is implemented here: https://github.com/microsoft/node-speech/blob/967976ce0f4887a2b5b27f486e5209a51588516f/src/main.cc#L477

The call to StopSpeakingAsync here: https://github.com/microsoft/node-speech/blob/967976ce0f4887a2b5b27f486e5209a51588516f/src/main.cc#L539

To reproduce from that module:

  • using node.js 18.x on the system
  • git clone https://github.com/microsoft/node-speech.git
  • open index.ts and append the snippet [1] at the end
  • from a terminal cd into the workspace and run npm i
  • run node index.js

[1]

const t = createSynthesizer({
  modelPath: '<path to TTS model>',
  modelName: 'Microsoft Server Speech Text to Speech Voice (en-US, AriaNeural)',
  modelKey: '<model key>',
}, (error, result) => {
  if (error) {
    console.error(error);
  } else {
    console.log(result);
  }
});
t.synthesize(`
Now more than ever, developers are expected to build voice-enabled applications that can reach a global audience. With the same voice persona across languages, organizations can keep their brand image more consistent. To support the growing need for a single voice to speak multiple languages, particularly in scenarios such as localization and translation, a multi-lingual neural TTS voice is brought out in public preview.



This new Jenny Multilingual voice (preview), with US English as the primary/default language, can speak 13 secondary languages, each at the fluent level: German (Germany), English (Australia), English (Canada), English (Canada), Spanish (Spain), Spanish (Mexico), French (Canada), French (France), Italian (Italy), Japanese (Japan), Korean (Korea), Portuguese (Brazil), Chinese (Mandarin, Simplified).
`);
setTimeout(() => t.stop(), 5000);

Expected behavior

Calling SpeechSynthesizer.StopSpeakingAsync immediately stops synthesis.

Version of the Cognitive Services Speech SDK

1.37.0

Platform, Operating System, and Programming Language

  • OS: Windows 11 (24H2)
  • Hardware: ARM
  • Programming language: C++

Additional context

This issue does not reproduce on macOS or Linux!

@ralph-msft ralph-msft added bug Something isn't working accepted Issue moved to product team backlog. Will be closed when addressed. labels Apr 26, 2024
@ralph-msft
Copy link

Thanks for using the Speech SDK and filing this issue. We have been able to reproduce the issue you are seeing, and have added fixing this issue to our backlog. We will update here once we have an update.

As a temporary workaround, you may want to consider passing a null value as the AudioConfig to the SpeechSynthesizer constructor. You can then subscribe to the Synthesizing event which will be raised whenever the SDK receives new audio from the service. You can then pass this audio to your player of choice which should give you more control over when the audio playback stops. Please note however that calling StopSpeakingAsync may still stall for ~10-15 seconds due the underlying issue.

(B-7172399)

@bpasero
Copy link
Author

bpasero commented Apr 26, 2024

Thanks, good to see it can be reproduced and I am looking forward to the fix 👍

Copy link

This item has been open without activity for 19 days. Provide a comment on status and remove "update needed" label.

@github-actions github-actions bot added the update needed For items that are in progress but have not been updated label May 16, 2024
@wtto00
Copy link

wtto00 commented May 16, 2024

Hello, I am using version 1.37.0, and I have encountered a similar issue.

stopSpeaking does not immediately terminate the playback process; it only stops the speaker from playing.

For example, if I generate a 14-second audio and execute stopSpeaking at 10 seconds, then let speakResult = synthesizer?.speakSsml(ssml) will immediately return with speakResult?.reason=9(SPXResultReason_SynthesizingAudioCompleted) instead of 1(SPXResultReason_Canceled). Moreover, the callback registered with synthesizer?.addSynthesisCompletedEventHandler is triggered after waiting for 4 seconds, rather than the callback registered with synthesizer?.addSynthesisCanceledEventHandler.

let ssml =
          "<speak version='1.0' xml:lang='en-US' xmlns='http://www.w3.org/2001/10/synthesis' xmlns:mstts='http://www.w3.org/2001/mstts'><voice name='\(identifier)'>\(mstts)</voice></speak>"
let speakResult = try self.synthesizer?.speakSsml(ssml)
print(speakResult?.reason ?? "")
try synthesizer?.stopSpeaking()

Here is a demo repositorie: https://github.com/wtto00/flutter_azure_speech/tree/main/example

The swift code is in https://github.com/wtto00/flutter_azure_speech/blob/eb419b89fcc16903cabaa8f9820559d93ed80861/ios/Classes/AzureSpeechPlugin.swift#L294

@github-actions github-actions bot removed the update needed For items that are in progress but have not been updated label May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Issue moved to product team backlog. Will be closed when addressed. bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants