You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If using conda, run: conda env create -f environment.yml and then activate the environment.
Set speech_key and service_region
Choose voice model to use in speech_synthesis_word_boundary_event.
Run python speech_synthesis_sample.py and enter your sample text.
NOTE: speak_text_async appears to handle converting the special characters to html entities automatically
View the results in the console, and where the logs are emitted (./out/)
For example, the sample text: Testing AT&T to see if it works will emit the word boundary events:
Word boundary event received: SpeechSynthesisWordBoundaryEventArgs(audio_offset=500000, duration=0:00:00.437500, text_offset=0, word_length=7), audio offset in ms: 50.0ms. Text: Testing
Word boundary event received: SpeechSynthesisWordBoundaryEventArgs(audio_offset=5000000, duration=0:00:00.962500, text_offset=-1, word_length=4), audio offset in ms: 500.0ms. Text: AT&a
Word boundary event received: SpeechSynthesisWordBoundaryEventArgs(audio_offset=14750000, duration=0:00:00.087500, text_offset=-1, word_length=3), audio offset in ms: 1475.0ms. Text: mp;
Word boundary event received: SpeechSynthesisWordBoundaryEventArgs(audio_offset=15750000, duration=0:00:00.200000, text_offset=11, word_length=4), audio offset in ms: 1575.0ms. Text: T to
Word boundary event received: SpeechSynthesisWordBoundaryEventArgs(audio_offset=17875000, duration=0:00:00.112500, text_offset=16, word_length=2), audio offset in ms: 1787.5ms. Text: se
Word boundary event received: SpeechSynthesisWordBoundaryEventArgs(audio_offset=19125000, duration=0:00:00.087500, text_offset=18, word_length=3), audio offset in ms: 1912.5ms. Text: e i
Word boundary event received: SpeechSynthesisWordBoundaryEventArgs(audio_offset=20125000, duration=0:00:00.575000, text_offset=21, word_length=10), audio offset in ms: 2012.5ms. Text: f it works
After the & is encountered, the word boundary events start reporting incorrect word boundaries (AT&a, mp;, T to, etc.). This issue also exists with the other two special characters < and >
Attached are some logs from running the input string Testing AT&T to see if it works against the voice models en-US-AndrewNeural and en-US-AriaNeural
Expected behavior
Word boundaries are reported correctly regardless if the special characters exist.
Version of the Cognitive Services Speech SDK
Python 1.37.0
Javascript 1.31.0
Platform, Operating System, and Programming Language
OS: MacOS, Ventura 13.6.6
Hardware: M1 Macbook Pro Max, ARM
Programming Language: Python, Javascript
Browser: Chrome (Javascript SDK used in Electron version 27 on the renderer process)
With Edge's Read Aloud, whether or not I'm using the multilingual versions of Andrew and Brian available there (which is a bit confusing as the ones that don't say "multilingual" in their names still act as such), it skips to the next sentence/passage every time it comes across those characters. Happens with Remy too.
Describe the bug
A subset of the voice models appear to have difficulty processing the three special characters:
<
>
and&
even when using entity format (https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-synthesis-markup-structure#special-characters). After a special character is present in the script, the WordBoundary events will begin to report incorrect word boundaries.A non-exhaustive list of voice models that appear to be exhibiting this behavior are:
en-US-AndrewNeural
en-US-BrianNeural
en-US-EmmaNeural
en-US-JennyMultilingualNeural
en-US-RyanMultilingualNeural
I've experienced this issue with the Javascript SDK, as well as the Python SDK. Sample code using the Python sample project here: https://gist.github.com/GJStevenson/ed2b0ca00691109dfd99ad3ef177b1a3
To Reproduce
conda env create -f environment.yml
and then activate the environment.speech_key
andservice_region
speech_synthesis_word_boundary_event
.python speech_synthesis_sample.py
and enter your sample text.NOTE:
speak_text_async
appears to handle converting the special characters to html entities automaticallyView the results in the console, and where the logs are emitted (
./out/
)Testing AT&T to see if it works
will emit the word boundary events:After the
&
is encountered, the word boundary events start reporting incorrect word boundaries (AT&a
,mp;
,T to
, etc.). This issue also exists with the other two special characters<
and>
Attached are some logs from running the input string
Testing AT&T to see if it works
against the voice modelsen-US-AndrewNeural
anden-US-AriaNeural
Expected behavior
Word boundaries are reported correctly regardless if the special characters exist.
Version of the Cognitive Services Speech SDK
Python 1.37.0
Javascript 1.31.0
Platform, Operating System, and Programming Language
Additional context
en-US-AndrewNeural Logs: speech_synthesis_en-US-AndrewNeural_20240430_163926.log
en-US-AriaNeural Logs: speech_synthesis_en-US-AriaNeural_20240430_165107.log
The text was updated successfully, but these errors were encountered: