Skip to content

modal-labs/quillman

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

QuiLLMan: Voice Chat with LLMs

A complete chat app that transcribes audio in real-time, streams back a response from a language model, and synthesizes this response as natural-sounding speech.

This repo is meant to serve as a starting point for your own language model-based apps, as well as a playground for experimentation. Contributions are welcome and encouraged!

quillman

The language model used is Zephyr. OpenAI Whisper is used for transcription, and Metavoice Tortoise TTS is used for text-to-speech. The entire app, including the frontend, is made to be deployed serverlessly on Modal.

You can find the demo live here.

[Note: this code is provided for illustration only; please remember to check the license before using any model for commercial purposes.]

File structure

  1. React frontend (src/frontend/)
  2. FastAPI server (src/app.py)
  3. Whisper transcription module (src/transcriber.py)
  4. Tortoise text-to-speech module (src/tts.py)
  5. Zephyr language model module (src/llm_zephyr.py)

Read the accompanying docs for a detailed look at each of these components.

Developing locally

Requirements

  • modal installed in your current Python virtual environment (pip install modal)
  • A Modal account
  • A Modal token set up in your environment (modal token new)

Develop on Modal

To serve the app on Modal, run this command from the root directory of this repo:

modal serve src.app

In the terminal output, you'll find a URL that you can visit to use your app. While the modal serve process is running, changes to any of the project files will be automatically applied. Ctrl+C will stop the app.

Deploy to Modal

Once you're happy with your changes, deploy your app:

modal deploy src.app

[Note that leaving the app deployed on Modal doesn't cost you anything! Modal apps are serverless and scale to 0 when not in use.]

About

A chat app that transcribes audio in real-time, streams back a response from a language model, and synthesizes this response as natural-sounding speech.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published