An AI-powered Dungeon Master that brings your D&D campaigns to life with voice interaction, dynamic image generation, and intelligent memory management. Speak your actions, listen to the DM's responses, and watch your adventure unfold visually in real-time.
- Speech-to-Text: Speak your actions naturally using OpenAI Whisper
- Dynamic Text-to-Speech: AI-generated voices for the DM and NPCs with personality-matched voice selection
- Real-time Processing: Seamless voice interaction without typing
- Scene Generation: Automatic image creation for every scene using AI image generation
- Character Portraits: Visual representations of NPCs you encounter
- Consistent Art Style: Maintains visual coherence throughout your campaign
- Interactive Campaign Planner: Collaborative campaign creation through guided questions
- Smart Memory System: Automatic compression of long conversation histories at natural story breaks
- Dynamic NPCs: Rich character interactions with unique voices and personalities
- Story Continuity: Maintains context across multiple sessions
- Configurable language settings for international players
- Automatic translation support via Google Translate
- Character-Specific Voices: Each NPC gets a unique voice personality
- Deep Conversations: Extended dialogue systems with natural conversation flow
- Visual Portraits: Generated character art for immersive interactions
- Python 3.10+
- Microphone and speakers/headphones
- API Keys (see Configuration section)
-
Clone the repository
git clone https://github.com/bertilbraun/agentic-llm-dnd-gm.git cd agentic-llm-dnd-gm
-
Install dependencies
pip install -r requirements.txt
-
Configure API keys
cp src/config.example.py src/config.py # Edit config.py with your API keys
-
Run the application
cd src python main.py # GUI version (recommended) python main.py --cli # Command-line version
Copy src/config.example.py
to src/config.py
and fill in your API keys:
# Required API Keys
GEMINI_API_KEY = 'your-gemini-api-key-here' # For the LLM
RUNWARE_API_KEY = 'your-runware-api-key-here' # For the image generation
OPENAI_API_KEY = 'your-openai-api-key-here' # For the text-to-speech
# Model Configuration
MODEL = 'gemini-2.5-flash' # AI model for DM responses
# Language Settings
LANGUAGE = 'en' # 'en', 'de', 'fr', 'es', etc.
- Google Gemini: For AI Dungeon Master responses
- Runware: For scene and character image generation
- OpenAI: For speech-to-text (Whisper) and text-to-speech
- First run creates an interactive campaign planner
- Answer questions about your preferred theme, tone, and story elements
- The AI generates a complete campaign structure with acts and visual style
- Click "Start Recording" to speak your actions
- Click "Stop Recording" when finished
- Listen to the DM's response and watch the scene unfold
- When you encounter NPCs, special conversation mode activates
- Each NPC has a unique voice and personality
- Conversations flow naturally until reaching a conclusion
- Every scene generates a new image automatically
- Images maintain consistent art style throughout the campaign
- Character portraits appear during NPC interactions
main.py
: Main game loop and campaign managementllm.py
: AI model integration (Gemini) with translation supportstt.py
: Speech-to-text using OpenAI Whispertts.py
: Text-to-speech with multiple voice personalitiesimage.py
: AI image generation via Runwarememory.py
: Intelligent conversation history managementqt.py
: PyQt6 GUI interface
Voice Input โ STT โ AI DM โ TTS + Image Generation โ GUI Update
โ
Memory System (Auto-compression at story breaks)
- Automatic Compression: Detects natural story breaks and compresses history
- Context Preservation: Maintains important information while reducing token usage
- Long-term Continuity: Supports extended campaigns across multiple sessions
- Structured Responses: Uses Pydantic models for consistent AI outputs
- Dynamic Prompting: Context-aware prompts based on campaign state
- Visual Consistency: Maintains art style across all generated images
- Modular TTS: Support for multiple TTS engines (Coqui, OpenAI)
- Plugin Architecture: Easy to add new AI models or services
- Configurable Voices: Extensive voice personality system
โโโ BertilBraun/Agentic-LLM-DnD-GM
โโโ requirements.txt
โโโ src/
โโโ config.example.py
โโโ image.py # Image generation
โโโ llm.py # AI model integration
โโโ main.py # Main game loop
โโโ memory.py # Memory management
โโโ qt.py # GUI interface
โโโ stt.py # Speech-to-text
โโโ tts.py # Text-to-speech
The codebase is designed for extensibility:
- New TTS Engines: Implement
BaseTTS
interface intts.py
- Additional AI Models: Extend the client setup in
llm.py
- Custom Memory Systems: Implement new compression strategies in
memory.py
- Solo D&D Sessions: Perfect for single-player adventures
- DM Assistance: Helps human DMs with NPC voices and scene visualization
- Story Prototyping: Test campaign ideas with AI feedback
- Accessibility: Voice-first interface for players with typing difficulties
- Language Learning: Practice RPG vocabulary in different languages
Audio Problems
- Ensure microphone permissions are granted
- Check default audio device settings
- Test with
--cli
mode first
API Key Issues
- Verify all API keys are correctly configured
- Check API service status and quotas
- Ensure config.py exists (not config.example.py)
Performance Issues
- Close other audio applications
- Use smaller Whisper model ('base' instead of 'large')
- Check available GPU memory for image generation
- Voice input/output system
- AI image generation for scenes
- Intelligent memory compression
- Multi-language support
- Rich NPC interaction system
- GUI interface
- Multiple player support
- Different input methods per player
- Campaign sharing and export
- Custom voice training
- Integration with D&D Beyond
- Mobile app version
Contributions are welcome! Please feel free to submit pull requests or open issues for:
- Bug fixes and performance improvements
- New TTS/STT engine integrations
- Additional language support
- UI/UX enhancements
- Documentation improvements
This project is open source. Please check the license file for details.
- OpenAI for Whisper STT and GPT-based TTS
- Google for Gemini AI models
- Runware for AI image generation
- Coqui TTS for open-source text-to-speech
- PyQt6 for the GUI framework
Ready to embark on your AI-powered D&D adventure? Set up your API keys and let the magic begin! ๐งโโ๏ธโจ