A LangGraph-based research assistant that helps users find, analyze, and summarize scientific papers using the CORE API.
This project implements a research assistant agent that can:
- Search for scientific papers
- Download and analyze PDFs
- Provide structured summaries
- Answer research-related questions
Original Implementation:
- The original paper used OpenAI's GPT models
- Required API keys and had usage costs
- Had specific message formatting requirements
Solution:
- Migrated to locally-hosted Mistral using Ollama
- Created a custom wrapper for Mistral to handle message formatting
- Implemented structured output parsing for consistent responses
- Eliminated API costs and privacy concerns
Workflow Management:
- Complex state management between different agent nodes
- Need for consistent message formatting
- Tool integration and execution flow
Solution:
- Used LangGraph for structured workflow
- Implemented state management through AgentState
- Created modular tools system for paper search and download
- Built clear node transitions and decision making
Challenges:
- Real-time communication with the agent
- Handling async operations
- Displaying structured research outputs
Solution:
- Created a Flask-based API endpoint
- Implemented async processing with proper error handling
- Built a responsive frontend for real-time interaction
- Structured output formatting for better readability
- Backend Framework: Flask
- LLM: Mistral (via Ollama)
- Workflow Management: LangGraph
- API Integration: CORE API
- PDF Processing: pdfplumber
- Frontend: HTML, CSS, JavaScript
- Install dependencies:
pip install -r requirements.txt
- Install Ollama and Mistral:
# Install Ollama
curl https://ollama.ai/install.sh | sh
# Pull Mistral model
ollama pull mistral
- Set up environment variables:
# Create .env file
CORE_API_KEY=your_key_here
Run the application:
python run.py
Project Structure
.
├── run.py # Flask app entry point
├── requirements.txt # Project dependencies
├── flaskApp/ # Flask application
│ ├── __init__.py
│ ├── config.py
│ ├── views.py
│ ├── static/
│ └── templates/
└── agent/ # Agent implementation
├── __init__.py
├── models.py # Pydantic models
├── core_wrapper.py # CORE API wrapper
├── mistral_wrapper.py # LLM wrapper
├── prompts.py # System prompts
├── tools.py # Agent tools
├── utils.py # Helper functions
└── workflow.py # LangGraph workflow
- Contributions are welcome! Please read our contributing guidelines and submit pull requests to our repository.
- This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- This implementation is based on the scientific research agent from NirDiamant/GenAI_Agents, specifically their implementation of the Scientific Papers Researcher
- Modified the original implementation to:
- Replaced OpenAI with locally-hosted Mistral via Ollama
- Added Flask web interface
- Restructured code for modularity
- Uses the CORE API for academic paper access
- Built with LangGraph and Mistral
The original implementation and research was done by:
- Repository: NirDiamant/GenAI_Agents
- Authors: Nir Diamant and contributors
- Implementation: Scientific Papers Researcher Jupyter notebook
This project maintains the core functionality and workflow of the original implementation while adapting it for web deployment and local LLM usage.
While the integration of tools and API is functional, the locally hosted Mistral model shows some limitations:
-
Hallucinations:
- Generated paper titles and authors that don't exist
- Produced plausible but fake content
- Inconsistent date ranges in results
-
Comparison with OpenAI Models:
- Less accurate paper summaries
- Lower quality of research analysis
- More prone to fabricating details
- Less reliable tool usage
-
Model Enhancement:
- Experiment with different local models depending on the deployment server specs
- Fine-tune for research paper analysis
- Optimize for tool usage
- Improve response formatting
-
Integration Refinement:
- Better state management
- Cleaner response handling
- Improved error handling
- Debug output filtering
-
Alternative Approaches:
- Enhance validation checks
- Add fact-checking mechanisms
Note: For production use cases requiring high accuracy and reliability, consider using OpenAI's models or other cloud-based solutions.