VoiceTuber Software Design Document

You are on the `main` branch, which means the save format is unstable, any update may break your save, so make small experimental projects.

1. Introduction

This design document outlines the features and functionality of VoiceTuber, a lightweight software application designed for PNGTubers. VoiceTuber streamlines the content creation process by emphasizing the use of only a microphone (and sometimes even no microphone), eliminating the need for additional hardware such as webcams or tracking devices. The software aims to provide an engaging and user-friendly experience for PNGTubers, making it easy for creators to produce dynamic and interactive content with minimal setup and equipment.

2. Features and Functionality

2.1. Core Features

Viseme detection for lip-syncing
Sprite-based animation
Rudimentary physics for realistic movement
Custom hotkeys for triggering expressions and gestures
Wake-word detection using Pocketsphinx

2.2. Text-to-Speech Integration

Integration with Azure TTS for text-to-speech functionality
- Step-by-step guide for users to obtain and set up an API key
TTS to voice Twitch chat messages
Alternative for VTubers to type instead of talking, using TTS for vocalization

2.3. Streaming Platform Integration

Twitch chat integration for seamless viewer interaction
Interactive features using Twitch bits or reward points, such as throwing objects at the model

2.4. Mouse Tracking for Eye and Body Movements

Avatar's eyes follow the user's cursor for a more lifelike and responsive experience
Post-processing pass and body morphing based on mouse position, creating an effect where the body of the model follows the mouse

3. Open-Source and Lightweight Approach

Released under an MIT license
Donation button for financial support through GitHub
Focus on maintaining a lightweight application without compromising features

4. User Interface and Customization

4.1. Avatar Customization

Integrated designer within the application
Blender-style editing for 2D avatar customization
Addition of gadgets and a toolbar for more traditional editing options

5. Dependencies

Dear ImGui - A lightweight and efficient Immediate Mode Graphical User Interface library for creating simple and functional graphical interfaces.
GLM - A C++ mathematics library for graphics software based on the OpenGL Shading Language (GLSL) specification, providing matrix operations and other essential functionality.
Pocketsphinx - A speech recognition library that converts spoken language into text.
SDL2 - A cross-platform development library that provides low-level access to audio, keyboard, mouse, and display functions, as well as window management.
cpptoml - A header-only library for parsing [TOML][toml] configuration files.
json - JSON parser library
libcurl - HTTP client library
libuv - A cross-platform that provides support for asynchronous I/O based on event loops
log - Small logging library to simplify debugging and monitoring of application processes.
sdlpp - A compact C++ wrapper around SDL2, streamlining its integration and usage in C++ applications.
ser - A lightweight and efficient serialization/deserialization library for C++
stb - A collection of single-file public domain libraries, specifically used in this project for decoding and encoding images in various formats.

6. Build Instructions

Windows/Visual Studio:

Ensure you have the Desktop development with C++ workload and the C++ CMake tools for Windows individual component installed.
Ensure you have Conan installed.
Export PocketSphinx to your Conan cache
- conan export ./recipes/pocketsphinx --version 5.0.1.
Install required dependencies
- conan install ./ --build missing -s build_type=Release.
Now open the project directory in Visual Studio and it will automatically configure your project.

After locally installing, you will need to copy the dependencies DLLs and the assets folder into the installation directory. You can add -d full_deploy to the conan install command to copy them outside the Conan cache and then manually copy them to the installation location.

Links:

CMake: https://cmake.org/download/
Conan: https://conan.io/downloads
Visual Studio: https://visualstudio.microsoft.com/

Linux:

Building with Conan

Ensure you have Conan, CMake and GCC installed.
- Conan might not be available in your distribution repositories, but it's also available through pip.
Clone the repository: git clone https://github.com/team-pp-studio/VoiceTuber.git; cd VoiceTuber
Initialize Git submodules: git submodule update --init --recursive
Export PocketSphinx to your conan cache by running conan export ./recipes/pocketsphinx --version 5.0.1.
Install and build required dependencies: conan install ./ --build missing -s build_type=Release
Configure CMake: cmake --preset conan-release
Build the binary: cmake --build --preset conan-release
And install locally: cmake --install build/Release --prefix ./install/

Building with coddle

Install dependencies

sudo apt-get install -y clang pkg-config libsdl2-dev libuv1-dev git cmake

Clone the app

git clone --recurse-submodules https://github.com/team-pp-studio/VoiceTuber.git

Build Pocketsphinx

cd VoiceTuber/3rd-party/pocketsphinx
cmake -S . -B build
cmake --build build
cmake --build build --target install
cd ../../..

Last cmake command you may need to run with sudo.

Clone and compile the build tool coddle

git clone https://github.com/coddle-cpp/coddle.git && cd coddle && ./build.sh

Install coddle

sudo ./deploy.sh
cd ..

Build VoiceTuber

cd VoiceTuber/src && coddle

Run the application

../VoiceTuber

7. TODO

Top Priority

the app crashes on Twitch chat with unstable Internet
implement mouth based on individual images instead of the sprite sheet
implement blink based on individual images

General Priority

remember directory in open/save dialog boxes
search for files in the dialog box
Transition to/Add Softbody Physics
(feature) support for transparency (Transparency for OBS so users will not need a green/blue background and can use all colors in the PNG model)
twitch extension (triggers for bits)
TikTok companion app
outdoor streaming from the phone with PNGTuber overlay
stream directly from VoiceTuber

Feedback

Completed

Name		Name	Last commit message	Last commit date
Latest commit History 193 Commits
.github		.github
3rd-party		3rd-party
assets-raw		assets-raw
assets		assets
coddle-repo		coddle-repo
demo		demo
recipes/pocketsphinx		recipes/pocketsphinx
src		src
.clang-format		.clang-format
.dir-locals.el		.dir-locals.el
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
TODO.md		TODO.md
conanfile.py		conanfile.py

License

team-pp-studio/VoiceTuber

Folders and files

Latest commit

History

Repository files navigation

VoiceTuber Software Design Document

You are on the main branch, which means the save format is unstable, any update may break your save, so make small experimental projects.

1. Introduction

2. Features and Functionality

2.1. Core Features

2.2. Text-to-Speech Integration

2.3. Streaming Platform Integration

2.4. Mouse Tracking for Eye and Body Movements

3. Open-Source and Lightweight Approach

4. User Interface and Customization

4.1. Avatar Customization

5. Dependencies

6. Build Instructions

7. TODO

About

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

Languages

You are on the `main` branch, which means the save format is unstable, any update may break your save, so make small experimental projects.