Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance Text-to-Speech (TTS) Integration with Audio File Management and Playback #1936

Open
wants to merge 18 commits into
base: staging
Choose a base branch
from

Conversation

johnbenac
Copy link
Contributor

This update introduces a comprehensive enhancement to the Text-to-Speech (TTS) feature, focusing on improving audio file management, storage, and in-message playback capabilities. Key changes include the creation of a dedicated endpoint for audio uploads, utility function enhancements for file handling, and user interface adjustments to facilitate audio control within messages, and partial migration to event-driven automatic audio generation. Here's a breakdown of the modifications:

New Audio Upload Endpoint (/uploadaudio): We've implemented a server-side endpoint to handle the upload and storage of generated TTS audio files. This endpoint accepts base64 encoded audio data, processes it, and saves it to a designated directory on the server.

Utility Function for File Saving (saveAudioAsFile): A new utility function has been added to convert audio data from base64 format to a file, saving it within the server's file system. This facilitates easy management and retrieval of TTS audio files.

Enhanced appendMediaToMessage Function: Modifications to this function allow for the dynamic insertion of audio elements within chat messages. Users can now directly play, pause, and download TTS-generated audio without leaving the chat interface.

Frontend Adjustments in index.html: Minor updates to the HTML structure support the display and control of embedded audio files within messages, ensuring a seamless user experience.

Script Updates (tts/index.js): Key script enhancements link the TTS feature with the new audio management capabilities. These changes ensure that TTS audio is correctly generated, saved, and associated with the corresponding message, complete with UI updates for immediate playback.

_Right now, the user message generation is broken. It can be manually generated, but not automatically generated, as the user entering a message does not emit an event, and the logic isn't smart enough to intelligently hunt through recent state for unrendered user audio

Together, these changes significantly improve the TTS feature by not only enriching the user's interactive experience with instant audio feedback but also by streamlining the backend management of TTS audio files. This update lays the groundwork for further enhancements in audio-based communication within SillyTavern.

@johnbenac
Copy link
Contributor Author

Per our conversation in discord:(https://discord.com/channels/1100685673633153084/1100820587586273343/1218600978119393300)

I've improved the trigger by using better message, so the autogeneration now works on user messages. I've tested it in groups and with streaming on and off and user messages being narrated and not, and it all works as it should.

@github-actions github-actions bot added the 🚫 Merge Conflicts [PR] Submitted code needs rebasing label Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🚫 Merge Conflicts [PR] Submitted code needs rebasing 🟨 PR - Medium
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants