Combining multiple mp3 files to be returned as a single MediaStreamTrack #1076
Replies: 3 comments 1 reply
-
Any update? |
Beta Was this translation helpful? Give feedback.
-
Hey I have done something a bit similar and i suggest looking at using some form of getting you tts output in some stream friendly format. I went the easy route of getting all my audio as pcm16 16000. Makes life a lot easier. the downside is you might probably need to manage some stuff yourself like timestamps, frame chunking etc As for your MediaStreamTrack, I think you might be running into an issue with the timestamps of the frames as it feels like the timestamps would reset on each new audio file you're reading which might confuse the recipient |
Beta Was this translation helpful? Give feedback.
-
Hi @pushkarprasad007 |
Beta Was this translation helpful? Give feedback.
-
I will be using LLM (like GPT) to generate an answer - which would then be converted to speech, which I want to send over to the browser using aiortc. However, since LLM take time to produce complete output, instead of waiting for it to complete, we can read partial answers as soon it appears, and every few words, generate mp3 file for those many words, and then stream those. So not all the mp3 files would be available immediately, and instead, I need to keep on adding them as soon they appear (say every 4-5 words) from LLM.
I wrote a custom MediaStreamTrack to achieve the same. I have tried this with 2 files, a.mp3 and b.mp3.
I ran across 2 issues:
Clearly, the addition of frames need to be done better so that this can work. I am definitely missing something here - would be great if someone can point me in the right direction.
Beta Was this translation helpful? Give feedback.
All reactions