XTTS: add inference_stream_text (slightly friendlier for text-streaming) #3724

czuzu · 2024-05-08T14:20:13Z

Hello,

Doing TTS streaming but also with text-streaming (text coming progressively over a stream), locally.
I know inference_stream theoretically is enough for this case, except for the beginning part (which indeed is not so bad to be repeated but nicer would be to be able to skip it too since it's not necessary):

language = language.split("-")[0]  # remove the country code
length_scale = 1.0 / max(speed, 0.05)
gpt_cond_latent = gpt_cond_latent.to(self.device) # nicer to be able to skip when doing text-streaming
speaker_embedding = speaker_embedding.to(self.device) # nicer to be able to skip when doing text-streaming

So I've added inference_stream_text (maybe not the best name, let me know if you prefer another) particularly for text-streaming, e.g.:

def text_streaming_generator():
    yield "It took me quite a long time to develop a voice and now that I have it I am not going to be silent."
    yield "Having discovered not just one, but many voices, I will champion each."

print("Inference with text streaming...")

text_gen = text_streaming_generator()
inf_gen = model.inference_stream_text(
    # note `text` param not provided as it will be streamed
    "en",
    gpt_cond_latent,
    speaker_embedding
)

wav_chunks = []
for text in text_gen:
    # Add text progressively
    model.inference_add_text(text, enable_text_splitting=True)
    for chunk in enumerate(inf_gen):
        if chunk is None:
            break # all chunks generated for the current text
        print(f"Received chunk {len(wav_chunks)} of audio length {chunk.shape[-1]}")
        wav_chunks.append(chunk)

# Call finalize to discard the inference generator
model.inference_finalize_text()

IMO this also makes for a nicer interface when doing text-streaming, I'll leave it to you to decide :)

Cheers! 🍻

CLAassistant · 2024-05-08T14:20:19Z

All committers have signed the CLA.

czuzu · 2024-05-08T18:15:02Z

I wonder is this repo still maintained or do I have to move the PR?

czuzu · 2024-05-10T13:11:37Z

Moved here: idiap#21

XTTS: add inference_stream_text (slightly friendlier for text-streaming)

57a47d2

Fix bad indent in inference_stream

6d20240

czuzu mentioned this pull request May 8, 2024

TGUI: add support for XTTSv2 local streaming (including sentences streaming) erew123/alltalk_tts#208

Open

czuzu closed this May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XTTS: add inference_stream_text (slightly friendlier for text-streaming) #3724

XTTS: add inference_stream_text (slightly friendlier for text-streaming) #3724

czuzu commented May 8, 2024

CLAassistant commented May 8, 2024 •

edited

czuzu commented May 8, 2024

czuzu commented May 10, 2024

XTTS: add inference_stream_text (slightly friendlier for text-streaming) #3724

XTTS: add inference_stream_text (slightly friendlier for text-streaming) #3724

Conversation

czuzu commented May 8, 2024

CLAassistant commented May 8, 2024 • edited

czuzu commented May 8, 2024

czuzu commented May 10, 2024

CLAassistant commented May 8, 2024 •

edited