Skip to content

Lightweight speech-to-speech web-based chat app combining speech recognition, LLM completion and text-to-speech. Implemented with Python (Flask) and vanilla JavaScript only.

License

KF-R/turk-chat

Repository files navigation

turk-chat conversation agent

If you're looking for the older async pygame version: https://github.com/KF-R/turk-chat-pygame


Features:

  • Ultra-lightweight; only a Python Flask server and vanilla JS.
  • Integrated ring buffer, sound activity detection algorithm and real-time animated speech visualization.
  • Automatic speech detection with termination detection; no push-to-talk or activation (listens, responds, listens... )
  • Speech is recorded, transcribed by either the OpenAI Whisper API or CTranslate2-based fast Whisper:
  • Transcribed speech, along with full chat history, is submitted to OpenAI API for a chat response.
  • Response is filtered for numbers, years, code blocks etc. in order to provide more naturalistic TTS.
  • Filtered response is read via ElevenLabs Text-To-Speech API or fast local TTS engine using:
  • Spoken response is visualized by way of a real-time waveform animation.
  • After the spoken response is complete, listening is resumed in order to facilitate fluid on-going conversation.
  • Integrated web access tools; turk-chat can grab current headlines, read wikipedia, summarise web pages etc.
  • Toggle between basic and advanced LLM back ends (e.g. GPT-3.5 vs GPT-4)
  • Obligatory Larson scanner using KITT and Cylon modes for a bit of additional visual feedback.
  • Simplified UI mode added (with KITT head-unit visualizer).

Usage:

  • git clone https://github.com/KF-R/turk-chat
  • Install requirements
    • sudo apt install portaudio-dev19
    • cd turk-chat
    • pip install -r requirements.txt
  • Set up API keys (See below and/or my_env.py.example)
  • Launch turk_flask.py, which is a Python Flask application.
  • Visit localhost port 5000 in your browser (e.g. https://127.0.0.1:5000/)
  • Approve the ad-hoc SSL certificate to authorise the page.
  • Click the Start Listening button. The first time you do this, you'll be asked to grant permissions to your microphone.
  • Start talking. Be patient with the response.
  • After your chat agent has finished speaking its response, it will automatically resume listening.
  • The Voice drop-down list is populated with the voice names from your ElevenLabs voice library.
  • You can change the responding voice without affecting the on-going conversation
  • Use the model switch to toggle between basic (e.g. GPT-3.5) and advanced (e.g. GPT-4) models.
  • To stop listening, click the Stop Listening button or refresh the page.
  • To clear/archive the chat message and engine logs, click the Reset button.
  • Archived conversations will be stored in the archive directory.
  • Code blocks generated by your chat partner will be stored in the sandbox directory.
  • Previously recorded .wav files are kept in audio_in
  • Previously generated .mp3 files are kept in audio_out

API keys:

As written, it expects 'my_env.py' in your home directory; its contents defining API keys as follows:

API_KEY_OPENAI = '<insert_your_OpenAI_API_key_here>'
API_KEY_ELEVENLABS = '<insert_your_ElevenLabs_API_key_here>'

Note:

The local TTS engine being used is https://balacoon.com/freeware/tts/package, which is x64-based. You'll need to stick with the Elevenlabs API and disable/replace 'Balacoon' if you're running on another platform e.g. Arm.


v0.4.x