-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance Text-to-Speech (TTS) Integration with Audio File Management and Playback #1936
Open
johnbenac
wants to merge
18
commits into
SillyTavern:staging
Choose a base branch
from
johnbenac:staging
base: staging
Could not load branches
Branch not found: {{ refName }}
Could not load tags
Nothing to show
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Per our conversation in discord:(https://discord.com/channels/1100685673633153084/1100820587586273343/1218600978119393300) I've improved the trigger by using better message, so the autogeneration now works on user messages. I've tested it in groups and with streaming on and off and user messages being narrated and not, and it all works as it should. |
github-actions
bot
added
the
🚫 Merge Conflicts
[PR] Submitted code needs rebasing
label
Apr 25, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This update introduces a comprehensive enhancement to the Text-to-Speech (TTS) feature, focusing on improving audio file management, storage, and in-message playback capabilities. Key changes include the creation of a dedicated endpoint for audio uploads, utility function enhancements for file handling, and user interface adjustments to facilitate audio control within messages, and partial migration to event-driven automatic audio generation. Here's a breakdown of the modifications:
New Audio Upload Endpoint (/uploadaudio): We've implemented a server-side endpoint to handle the upload and storage of generated TTS audio files. This endpoint accepts base64 encoded audio data, processes it, and saves it to a designated directory on the server.
Utility Function for File Saving (saveAudioAsFile): A new utility function has been added to convert audio data from base64 format to a file, saving it within the server's file system. This facilitates easy management and retrieval of TTS audio files.
Enhanced appendMediaToMessage Function: Modifications to this function allow for the dynamic insertion of audio elements within chat messages. Users can now directly play, pause, and download TTS-generated audio without leaving the chat interface.
Frontend Adjustments in index.html: Minor updates to the HTML structure support the display and control of embedded audio files within messages, ensuring a seamless user experience.
Script Updates (tts/index.js): Key script enhancements link the TTS feature with the new audio management capabilities. These changes ensure that TTS audio is correctly generated, saved, and associated with the corresponding message, complete with UI updates for immediate playback.
_Right now, the user message generation is broken. It can be manually generated, but not automatically generated, as the user entering a message does not emit an event, and the logic isn't smart enough to intelligently hunt through recent state for unrendered user audio
Together, these changes significantly improve the TTS feature by not only enriching the user's interactive experience with instant audio feedback but also by streamlining the backend management of TTS audio files. This update lays the groundwork for further enhancements in audio-based communication within SillyTavern.