Skip to content

cake-lab/ventriloquist

Repository files navigation

ventriloquist

ventriloquist is a browser-based VTubing app inspired by Alter Echo

Tian Guo

Jack Sullivan

Introduction

Ventriloquist is a browser-based VTubing app built with kalidokit. Ventriloquist is inspired by AlterEcho, a loosely-coupled avatar-streamer model which supports both one-to-one motion capture and preset animations ("gestures") which are triggered either manually or with poses/facial expressions.

Left: Motion capture wave. Right: Custom gesture ("gangnam style" from Mixamo)

Functionality

Loose-Coupling Model

The Alter Echo paper proposes a loose-coupling model between the user and the VTuber model. The model follows the user with one-to-one motion capture, but facial expressions and hand positions can trigger the model to perform preset animations ("gestures").

Control Flow

There is one model loaded into a three.js scene, and its controller switches between one-to-one motion capture and preset animations from mixamo based on user-defined triggers.

High-level diagram

Figure 1: Ventriloquist's general control flow

Packages

Ventriloquist runs mostly in the browser, but it also has a server for things like user presets and static file hosting. It is built with nextjs, so both the front- and backend are included in this repository.

In the browser, ventriloquist heavily leverages the following libraries:

  • mediapipe is used for face and pose recognition using a webcam. This project uses the holistic tracker.
  • kalidokit is used on top of Mediapipe to calculate both raw position and kinematics for the VTuber model.
  • three-vrm generates the VRM model from three's default GLTFLoader.

Gestures

Currently, the only supported gestures are .fbx animations from mixamo. Make sure you select the "Without Skin" option.