A comprehensive Retrieval-Augmented Generation (RAG) system specifically designed for Vietnamese medical content, featuring multimodal capabilities with text and image understanding, advanced reranking, and benchmarking tools.
- Vietnamese Medical RAG: Specialized for Vietnamese healthcare content with medical terminology understanding
- Multimodal Support: Text and image retrieval with enhanced caption embeddings
- Advanced Reranking: BGE-M3 reranker for improved context relevance
- Real-time Image Generation: AI-powered medical image generation based on answers
- Comprehensive Benchmarking: Top-K accuracy evaluation for both text and image retrieval
- Streamlit Web Interface: User-friendly chat interface with Vietnamese support
- Production-Ready Serving: Modal-based microservices architecture for scalable deployment
- Pipeline Engine (
pipeline.py
): LangGraph-based processing workflow - Vector Stores (
vectorstore.py
): Qdrant-based document and image storage - Embedding Models:
- BGE-M3-v3 for text content embedding
- BGE-M3-image for image caption embedding
- Reranker (
reranker.py
): BGE-M3 reranker for document ranking - Serving Layer (
serving.py
): Modal-based API endpoints - Web Interface (
app.py
): Streamlit chat application
User Question → Context Embedding → Vector Search → Reranking →
Context Selection → Answer Generation → Image Search →
Image Generation (optional) → Final Response
- Python 3.11+
- CUDA-capable GPU (recommended for embeddings and reranking)
- Qdrant vector database
- Azure OpenAI API access
- Modal account (for cloud deployment)
- Clone the repository:
git clone <repository-url>
cd VDT-UniversalRAG
- Install dependencies:
pip install -r requirements.txt
- Environment Configuration:
Create a
.env
file with the following variables:
# Qdrant Configuration
QDRANT_URL=your_qdrant_url
QDRANT_API_KEY=your_qdrant_api_key
# Azure OpenAI Configuration
AZURE_ENDPOINT=your_azure_endpoint
AZURE_API_KEY=your_azure_api_key
AZURE_VERSION=your_api_version
# Azure Image Generation
AZURE_IMAGE_API_KEY=your_image_api_key
AZURE_IMAGE_VERSION=your_image_api_version
AZURE_IMAGE_ENDPOINT=your_image_endpoint
# Optional: Weights & Biases for experiment tracking
WANDB_API_KEY=your_wandb_key
The system works with several dataset formats:
- Location:
datasets/context_corpus_embedded_enhanced.jsonl
- Format: Pre-computed embeddings with metadata
{
"chunk_id": "unique_id",
"content": "Vietnamese medical content",
"embedding": [float_vector],
"metadata": {
"title": "Document title",
"keyword": "Medical topic",
"source": "youmed.vn"
}
}
- Training:
datasets/q_a_train_filtered.jsonl
- Testing:
datasets/q_a_test_filtered.jsonl
- Validation:
datasets/q_a_validation_filtered.jsonl
- Image-Question Mappings:
datasets/image_question_mappings_*.jsonl
- Caption Embeddings:
datasets/caption_embeddings.jsonl
streamlit run app.py
Access the interface at http://localhost:8501
for an interactive Vietnamese medical Q&A experience.
from pipeline import create_app_rag_graph
# Initialize the RAG graph
graph = create_app_rag_graph()
# Process a question
state = {
"question": "Thuốc Paracetamol có tác dụng gì?"
}
# Run the pipeline
result = graph.invoke(state)
print(result["final_answer"])
from vectorstore import VectorStore
# Initialize vector store
vector_store = VectorStore(
collection_name="universal-rag-precomputed-enhanced"
)
# Load pre-computed embeddings
vector_store.load_documents_from_jsonl(
"datasets/context_corpus_embedded_enhanced.jsonl"
)
python benchmark.py
Evaluates Top-K accuracy for text retrieval using various K values (1, 3, 5, 10, 20).
python image_benchmark.py
Evaluates multimodal retrieval performance for image-question pairs.
python benchmark_reranker.py
Compares performance with and without reranking.
Deploy the serving layer to Modal:
modal deploy serving.py
This provides scalable API endpoints:
/embed-context
: Context embedding/embed-image-caption
: Image caption embedding/rerank-documents
: Document reranking
For local development:
python serving.py
modal run embeddings.py::finetune_embeddings --num-train-epochs=4
modal run reranker.py::train_reranker_with_prebuilt_data --num-train-epochs=2
modal run multimodal_embeddings.py::run_training --max-train-samples=5000
- Chunk Size: 512 tokens (optimized for BGE-M3)
- Overlap: 128 tokens
- Context Prefix: Vietnamese medical context
- Initial Retrieval: Top-20 documents
- Reranking: Top-5 documents
- Final Selection: Top-3 contexts
- Model: Azure OpenAI GPT-4
- Temperature: 0.1 (deterministic responses)
- Max Tokens: 1000
Recent benchmark results:
- Text Retrieval Top-5 Accuracy: 85%+
- Image Retrieval Top-5 Accuracy: 78%+
- Reranker Improvement: +12% over baseline
- Average Response Time: <3 seconds
-
Qdrant Connection Error
- Verify
QDRANT_URL
andQDRANT_API_KEY
in.env
- Check network connectivity
- Verify
-
Azure OpenAI Rate Limits
- Implement retry logic
- Consider using multiple API keys
-
CUDA Memory Issues
- Reduce batch sizes in training
- Use gradient checkpointing
-
Modal Deployment Issues
- Ensure Modal CLI is installed:
pip install modal
- Authenticate:
modal token set
- Ensure Modal CLI is installed:
- Use SSD storage for vector databases
- Enable GPU acceleration for embeddings
- Implement request batching for high throughput
- Use CDN for static assets (images)
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Update documentation
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- BGE-M3: Multilingual embedding model
- LangGraph: Workflow orchestration
- Qdrant: Vector database
- Modal: Cloud compute platform
For questions and support:
- Create an issue in this repository
- Review the troubleshooting section
- Check the benchmark results in
results/
directory
Note: This system is designed specifically for Vietnamese medical content and may require adaptation for other languages or domains.