Enhance Text-to-Speech (TTS) Integration with Audio File Management and Playback #1936

johnbenac · 2024-03-16T04:53:50Z

This update introduces a comprehensive enhancement to the Text-to-Speech (TTS) feature, focusing on improving audio file management, storage, and in-message playback capabilities. Key changes include the creation of a dedicated endpoint for audio uploads, utility function enhancements for file handling, and user interface adjustments to facilitate audio control within messages, and partial migration to event-driven automatic audio generation. Here's a breakdown of the modifications:

New Audio Upload Endpoint (/uploadaudio): We've implemented a server-side endpoint to handle the upload and storage of generated TTS audio files. This endpoint accepts base64 encoded audio data, processes it, and saves it to a designated directory on the server.

Utility Function for File Saving (saveAudioAsFile): A new utility function has been added to convert audio data from base64 format to a file, saving it within the server's file system. This facilitates easy management and retrieval of TTS audio files.

Enhanced appendMediaToMessage Function: Modifications to this function allow for the dynamic insertion of audio elements within chat messages. Users can now directly play, pause, and download TTS-generated audio without leaving the chat interface.

Frontend Adjustments in index.html: Minor updates to the HTML structure support the display and control of embedded audio files within messages, ensuring a seamless user experience.

Script Updates (tts/index.js): Key script enhancements link the TTS feature with the new audio management capabilities. These changes ensure that TTS audio is correctly generated, saved, and associated with the corresponding message, complete with UI updates for immediate playback.

_Right now, the user message generation is broken. It can be manually generated, but not automatically generated, as the user entering a message does not emit an event, and the logic isn't smart enough to intelligently hunt through recent state for unrendered user audio

Together, these changes significantly improve the TTS feature by not only enriching the user's interactive experience with instant audio feedback but also by streamlining the backend management of TTS audio files. This update lays the groundwork for further enhancements in audio-based communication within SillyTavern.

…ndex.js

johnbenac · 2024-03-16T17:24:50Z

Per our conversation in discord:(https://discord.com/channels/1100685673633153084/1100820587586273343/1218600978119393300)

I've improved the trigger by using better message, so the autogeneration now works on user messages. I've tested it in groups and with streaming on and off and user messages being narrated and not, and it all works as it should.

johnbenac added 17 commits March 13, 2024 22:31

Your descriptive commit message

b2d88ba

audio is being added to messages, but not refreshing the message

96093b1

Update .gitignore to exclude certain files and directories

8aabb14

Revert .gitignore to match upstream staging

0c81dac

fixed message passing for immediate DOM update

4743720

manual and auto messages now saved and updated in DOM

9262569

unknown change

df7fcc1

breaking change that may zero out user messages

c304bd0

save method more like stable diffusion

e50e8f2

improved formatting

5597937

Sync with upstream/staging

2ff5d4c

pre bug squash commit

61ac7fa

Reset public/scripts/extensions/tts/index.js to match upstream staging

744687e

Reintroduce pre-bug squash changes to public/scripts/extensions/tts/i…

2ed1b66

…ndex.js

event based autogeneration of char only

4945c6f

Merge remote-tracking branch 'upstream/staging' into staging

f13cf3e

simple event triggers, working user auto generation

bdf6bc6

Merge branch 'staging' into tts_audio

d02240c

deffcolony added the 🟨 PR - Medium label Mar 29, 2024

github-actions bot added the 🚫 Merge Conflicts [PR] Submitted code needs rebasing label Apr 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance Text-to-Speech (TTS) Integration with Audio File Management and Playback #1936

Enhance Text-to-Speech (TTS) Integration with Audio File Management and Playback #1936

johnbenac commented Mar 16, 2024

johnbenac commented Mar 16, 2024

Enhance Text-to-Speech (TTS) Integration with Audio File Management and Playback #1936

Are you sure you want to change the base?

Enhance Text-to-Speech (TTS) Integration with Audio File Management and Playback #1936

Conversation

johnbenac commented Mar 16, 2024

johnbenac commented Mar 16, 2024