Skip to content

Voice-first AI Dungeon Master powered by agent-based LLMs, Whisper STT, pluggable TTS, and Runware scene illustrations for immersive tabletop adventures.

Notifications You must be signed in to change notification settings

BertilBraun/Agentic-LLM-DnD-GM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

32 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐ŸŽฒ Voice-Driven D&D Game Master

An AI-powered Dungeon Master that brings your D&D campaigns to life with voice interaction, dynamic image generation, and intelligent memory management. Speak your actions, listen to the DM's responses, and watch your adventure unfold visually in real-time.

Voice-Driven D&D Screenshot

โœจ Features

๐ŸŽ™๏ธ Voice-First Experience

  • Speech-to-Text: Speak your actions naturally using OpenAI Whisper
  • Dynamic Text-to-Speech: AI-generated voices for the DM and NPCs with personality-matched voice selection
  • Real-time Processing: Seamless voice interaction without typing

๐Ÿ–ผ๏ธ Visual Storytelling

  • Scene Generation: Automatic image creation for every scene using AI image generation
  • Character Portraits: Visual representations of NPCs you encounter
  • Consistent Art Style: Maintains visual coherence throughout your campaign

๐Ÿง  Intelligent Campaign Management

  • Interactive Campaign Planner: Collaborative campaign creation through guided questions
  • Smart Memory System: Automatic compression of long conversation histories at natural story breaks
  • Dynamic NPCs: Rich character interactions with unique voices and personalities
  • Story Continuity: Maintains context across multiple sessions

๐ŸŒ Multi-Language Support

  • Configurable language settings for international players
  • Automatic translation support via Google Translate

๐ŸŽญ Rich NPC Interactions

  • Character-Specific Voices: Each NPC gets a unique voice personality
  • Deep Conversations: Extended dialogue systems with natural conversation flow
  • Visual Portraits: Generated character art for immersive interactions

๐Ÿš€ Quick Start

Prerequisites

  • Python 3.10+
  • Microphone and speakers/headphones
  • API Keys (see Configuration section)

Installation

  1. Clone the repository

    git clone https://github.com/bertilbraun/agentic-llm-dnd-gm.git
    cd agentic-llm-dnd-gm
  2. Install dependencies

    pip install -r requirements.txt
  3. Configure API keys

    cp src/config.example.py src/config.py
    # Edit config.py with your API keys
  4. Run the application

    cd src
    python main.py        # GUI version (recommended)
    python main.py --cli  # Command-line version

โš™๏ธ Configuration

Copy src/config.example.py to src/config.py and fill in your API keys:

# Required API Keys
GEMINI_API_KEY = 'your-gemini-api-key-here' # For the LLM
RUNWARE_API_KEY = 'your-runware-api-key-here' # For the image generation
OPENAI_API_KEY = 'your-openai-api-key-here' # For the text-to-speech

# Model Configuration
MODEL = 'gemini-2.5-flash'  # AI model for DM responses

# Language Settings
LANGUAGE = 'en'  # 'en', 'de', 'fr', 'es', etc.

Required Services

  • Google Gemini: For AI Dungeon Master responses
  • Runware: For scene and character image generation
  • OpenAI: For speech-to-text (Whisper) and text-to-speech

๐ŸŽฎ How to Play

1. Campaign Creation

  • First run creates an interactive campaign planner
  • Answer questions about your preferred theme, tone, and story elements
  • The AI generates a complete campaign structure with acts and visual style

2. Voice Interaction

  • Click "Start Recording" to speak your actions
  • Click "Stop Recording" when finished
  • Listen to the DM's response and watch the scene unfold

3. NPC Conversations

  • When you encounter NPCs, special conversation mode activates
  • Each NPC has a unique voice and personality
  • Conversations flow naturally until reaching a conclusion

4. Visual Experience

  • Every scene generates a new image automatically
  • Images maintain consistent art style throughout the campaign
  • Character portraits appear during NPC interactions

๐Ÿ—๏ธ Architecture

Core Components

  • main.py: Main game loop and campaign management
  • llm.py: AI model integration (Gemini) with translation support
  • stt.py: Speech-to-text using OpenAI Whisper
  • tts.py: Text-to-speech with multiple voice personalities
  • image.py: AI image generation via Runware
  • memory.py: Intelligent conversation history management
  • qt.py: PyQt6 GUI interface

Data Flow

Voice Input โ†’ STT โ†’ AI DM โ†’ TTS + Image Generation โ†’ GUI Update
                โ†“
            Memory System (Auto-compression at story breaks)

๐Ÿงฉ Advanced Features

Memory Management

  • Automatic Compression: Detects natural story breaks and compresses history
  • Context Preservation: Maintains important information while reducing token usage
  • Long-term Continuity: Supports extended campaigns across multiple sessions

Multi-Modal AI

  • Structured Responses: Uses Pydantic models for consistent AI outputs
  • Dynamic Prompting: Context-aware prompts based on campaign state
  • Visual Consistency: Maintains art style across all generated images

Extensible Design

  • Modular TTS: Support for multiple TTS engines (Coqui, OpenAI)
  • Plugin Architecture: Easy to add new AI models or services
  • Configurable Voices: Extensive voice personality system

๐Ÿ› ๏ธ Development

Project Structure

โ””โ”€โ”€ BertilBraun/Agentic-LLM-DnD-GM
    โ”œโ”€โ”€ requirements.txt
    โ””โ”€โ”€ src/
        โ”œโ”€โ”€ config.example.py
        โ”œโ”€โ”€ image.py          # Image generation
        โ”œโ”€โ”€ llm.py           # AI model integration
        โ”œโ”€โ”€ main.py          # Main game loop
        โ”œโ”€โ”€ memory.py        # Memory management
        โ”œโ”€โ”€ qt.py           # GUI interface
        โ”œโ”€โ”€ stt.py          # Speech-to-text
        โ””โ”€โ”€ tts.py          # Text-to-speech

Adding New Features

The codebase is designed for extensibility:

  • New TTS Engines: Implement BaseTTS interface in tts.py
  • Additional AI Models: Extend the client setup in llm.py
  • Custom Memory Systems: Implement new compression strategies in memory.py

๐ŸŽฏ Use Cases

  • Solo D&D Sessions: Perfect for single-player adventures
  • DM Assistance: Helps human DMs with NPC voices and scene visualization
  • Story Prototyping: Test campaign ideas with AI feedback
  • Accessibility: Voice-first interface for players with typing difficulties
  • Language Learning: Practice RPG vocabulary in different languages

๐Ÿ”ง Troubleshooting

Common Issues

Audio Problems

  • Ensure microphone permissions are granted
  • Check default audio device settings
  • Test with --cli mode first

API Key Issues

  • Verify all API keys are correctly configured
  • Check API service status and quotas
  • Ensure config.py exists (not config.example.py)

Performance Issues

  • Close other audio applications
  • Use smaller Whisper model ('base' instead of 'large')
  • Check available GPU memory for image generation

๐Ÿ“‹ Roadmap

Current Features โœ…

  • Voice input/output system
  • AI image generation for scenes
  • Intelligent memory compression
  • Multi-language support
  • Rich NPC interaction system
  • GUI interface

Planned Features ๐Ÿ”ฎ

  • Multiple player support
  • Different input methods per player
  • Campaign sharing and export
  • Custom voice training
  • Integration with D&D Beyond
  • Mobile app version

๐Ÿค Contributing

Contributions are welcome! Please feel free to submit pull requests or open issues for:

  • Bug fixes and performance improvements
  • New TTS/STT engine integrations
  • Additional language support
  • UI/UX enhancements
  • Documentation improvements

๐Ÿ“„ License

This project is open source. Please check the license file for details.

๐Ÿ™ Acknowledgments

  • OpenAI for Whisper STT and GPT-based TTS
  • Google for Gemini AI models
  • Runware for AI image generation
  • Coqui TTS for open-source text-to-speech
  • PyQt6 for the GUI framework

Ready to embark on your AI-powered D&D adventure? Set up your API keys and let the magic begin! ๐Ÿง™โ€โ™‚๏ธโœจ

About

Voice-first AI Dungeon Master powered by agent-based LLMs, Whisper STT, pluggable TTS, and Runware scene illustrations for immersive tabletop adventures.

Resources

Stars

Watchers

Forks

Languages