Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenAI tts with word boundary event result #2303

Closed
ling-k opened this issue Mar 15, 2024 · 9 comments
Closed

OpenAI tts with word boundary event result #2303

ling-k opened this issue Mar 15, 2024 · 9 comments
Assignees
Labels
enhancement New feature or request in-review In review text-to-speech Text-to-Speech update needed For items that are in progress but have not been updated

Comments

@ling-k
Copy link

ling-k commented Mar 15, 2024

Describe the solution you'd like
The openAI TTS voices, such as en-US-AlloyMultilingualNeuralHD, en-US-EchoMultilingualNeuralHD, do not return word boundary event results.
Additional context

def word_boundary_handler(evt):
    print(f"My Word boundary event received: {evt.text}, audio offset in ms: {evt.audio_offset / 10000}ms")
    word_boundaries[str(evt.audio_offset / 10000)] = evt.text 
    
   For Azure voices, such as "en-US-EmmaNeural", "zh-CN-XiaoxiaoNeural", it works fine. But for openAI voices, it does not return anything.  
@ling-k ling-k changed the title OpenAI tts with with word boundary event result OpenAI tts with word boundary event result Mar 15, 2024
@BrianMouncer
Copy link
Contributor

The models used for the OpenAI tts voices do not provide word level timing information, so it is currently not possible to get those events when using those voices. There are also a few limitations related to what SSML tags are supported by the OpenAI voices. You can find more information about those limitations here https://learn.microsoft.com/en-us/azure/ai-services/speech-service/openai-voices#ssml-elements-supported-by-openai-text-to-speech-voices-in-azure-ai-speech

I will work with our docs team to also document the other limitations, like word level timing data.

@BrianMouncer BrianMouncer added the pending close Closed soon without new activity label Mar 20, 2024
@yulin-li
Copy link
Contributor

The wordboundary is not supported in AOAI voices yet.

We should update the doc, @Kerry-LinZhang could you help track the doc refresh?

@BrianMouncer
Copy link
Contributor

BrianMouncer commented Mar 29, 2024

@ling-k. For future planning, what data center region are you using, that you want word level timing support from OpenAI voices, and or what other regions are most important to you.

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/regions

@ling-k
Copy link
Author

ling-k commented Mar 29, 2024

@ling-k. For future planning, what data center region are you using, that you want word level timing support from OpenAI voices, and or what other regions are most important to you.

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/regions

Thanks for replying. I mainly work on West US or West US 2 regions.

@pankopon pankopon added enhancement New feature or request in-review In review text-to-speech Text-to-Speech and removed pending close Closed soon without new activity labels Apr 12, 2024
@pankopon
Copy link
Contributor

@yulin-li @Kerry-LinZhang So I guess this needs a documentation update at least, possibly also creation of a future work item? Please update status and close when done.

@Kerry-LinZhang
Copy link

Hi @ling-k we will update our Doc related in this week and I will keep you updated once it has been released.

@Kerry-LinZhang
Copy link

Copy link

This item has been open without activity for 19 days. Provide a comment on status and remove "update needed" label.

@github-actions github-actions bot added the update needed For items that are in progress but have not been updated label May 20, 2024
@pankopon
Copy link
Contributor

Closed as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request in-review In review text-to-speech Text-to-Speech update needed For items that are in progress but have not been updated
Projects
None yet
Development

No branches or pull requests

5 participants