Skip to content

ai-agents-2030/awesome-deep-research-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 

Repository files navigation

Awesome Deep Research Agent

We maintain a curated collection of papers exploring the path towards Deep Research (DR) Agents, focusing on formulating core concepts and mapping the research landscape.

⌛️ Coming soon – Version 2! We’re continuously compiling and updating cutting‑edge insights. Feel free to suggest any related work you find valuable!

Build a digital assistant on your screen. Generated by DALL-E-3.

WELCOME CONTRIBUTE!

🔥 This project is actively maintained, and we welcome your contributions. If you have any suggestions, such as missing papers or information, please feel free to open an issue or submit a pull request.

Our Works Towards DR Agents

✨✨✨ Deep Research Agents: A Systematic Examination And Roadmap

Structural overview of a DR agent An overview of a DR agent

Awesome Papers

Table of Contents

  1. Search Engine Integration
  2. Tool Use
  3. Architecture & Workflow
  4. Tuning Methods
  5. Industrial Applications
  6. Benchmarks for DR Agents

Search Engine Integration

📊 Search Engine · API vs Browser Comparison
Legend ✔️ Primary focus 🟫 Secondary/minor focus — Not present
DR Agent API Browser GAIA HLE QA Base Model
Avatar 🟫 Stark Claude‑3‑Opus, GPT‑4
CoSearch‑Agent ✔️ GPT‑3.5‑turbo
MMAC-Copilot ✔️ ✔️ GPT‑3.5, GPT‑4
Storm 🟫 FreshWiki GPT‑3.5‑turbo
OpenResearcher ✔️ Private QA DeepSeek‑V2‑Chat
The AI Scientist ✔️ MLE‑Bench GPT‑4o, o1‑mini, o1‑preview
Gemini DR ✔️ ✔️ ✔️ GPQA Gemini‑2.0‑Flash
Agent Laboratory ✔️ MLE‑Bench GPT‑4o, o1‑preview
Search‑o1 ✔️ GPQA·NQ·TriviaQA QwQ‑32B‑preview
Agentic Reasoning ✔️ GPQA DeepSeek‑R1, Qwen2.5
AutoAgent ✔️ ✔️ Claude‑Sonnet‑3.5
Grok DeepSearch ✔️ ✔️ GPQA Grok 3
OpenAI DR ✔️ ✔️ ✔️ ✔️ GPT‑o3
Perplexity DR ✔️ 🟫 ✔️ SimoleQA Flexible
AgentRxiv ✔️ GPQA·MedQA GPT‑4o‑mini
Agent‑R1 ✔️ HotpotQA Qwen2.5‑1.5B‑Inst
AutoGLM Rumination ✔️ GPQA GLM‑Z1‑Air
Copilot Researcher ✔️ o3‑mini
H2O.ai DR ✔️ ✔️ ✔️ h2ogpt‑oasst1‑512‑12b
Manus ✔️ ✔️ Claude3.5, GPT‑4o
OpenManus ✔️ ✔️ Claude3.5, GPT‑4o
OWL ✔️ ✔️ ✔️ DeepSeek‑R1, Gemini‑2.5‑Pro, GPT‑4o
R1‑Searcher 🟫 2WikiMultiHopQA, HotpotQA Llama3.1‑8B‑Inst, Qwen2.5‑7B
ReSearch 🟫 2WikiMultiHopQA, HotpotQA Qwen2.5‑7B, Qwen2.5‑7B‑Inst
Search‑R1 🟫 2WikiMultiHopQA, HotpotQA, NQ, TriviaQA Llama3.2‑3B, Qwen2.5‑3B/7B
DeepResearcher ✔️ HotpotQA, NQ, TriviaQA Qwen2.5‑7B‑Inst
Genspark Super Agent ✔️ ✔️ ✔️ Mixture of 9 LLMs
WebThinker ✔️ ✔️ ✔️ GPQA, WebWalkerQA QwQ‑32B
SWIRL ✔️ HotQA, BeerQA Gemma 2‑27B
SimpleDeepSearcher ✔️ ✔️ 2WikiMultiHopQA Qwen2.5‑7B/32B‑In, DeepSeek‑D‑Qwen2.5‑32B, QwQ‑32B
Suna AI ✔️ ✔️ GPT‑4o, Claude
AgenticSeek ✔️ GPT‑4o, DeepSeek‑R1, Claude
Alita ✔️ ✔️ ✔️ PathVQA GPT‑4o, Claude‑Sonnet‑4
DeerFlow ✔️ Doubao‑1.5‑Pro‑32k, DeepSeek‑R1, GPT‑4o, Qwen
PANGU DEEPDIVER ✔️ C-SimpleQA, HotpotQA, ProxyQA Pangu‑7B‑Reasoner
WebDancer ✔️ ✔️ ✔️ GAIA, WebWalkerQA WebDancer‑QwQ-32B

Tool Use 

📊 Tool Use Capabilities Comparison
Legend ✔️ Involved 🟫 Non Disclosure — Not present
DR Agent Code Interp. Data Analytics Multimodal MCP
CoSearchAgent ✔️
Storm ✔️
The AI Scientist ✔️
Agent Laboratory ✔️
Agentic Reasoning ✔️
AutoAgent ✔️ ✔️
Genspark DR ✔️ ✔️ ✔️ ✔️
Grok DeepSearch ✔️ ✔️ ✔️ 🟫
OpenAI DR ✔️ ✔️ ✔️ ✔️
Perplexity DR ✔️ ✔️ ✔️ 🟫
Agent‑R1 ✔️
AutoGLM Rumination ✔️ ✔️ 🟫
Copilot Researcher ✔️ ✔️ 🟫 🟫
Manus ✔️ ✔️ ✔️ 🟫
OpenManus ✔️ ✔️ ✔️
OWL ✔️ ✔️ ✔️ ✔️
H2O.ai DR ✔️ ✔️ ✔️ 🟫
Genspark Super Agent ✔️ ✔️ ✔️ ✔️
WebThinker ✔️
Suna AI ✔️ ✔️
AgenticSeek ✔️ ✔️
Alita ✔️ 🟫 🟫
DeerFlow ✔️ ✔️

Architecture & Workflow

Architecture & Workflow

Static Workflow

Dynamic Single‑Agent Workflow

Dynamic Multi‑Agent Workflow

Tuning Methods

📊 Tuning Methods Comparison
Legend ✔️ Implemented 🟫 Details Unknown — Not present
DR Agent SFT RL Base Model Data Reward Design
Gemini DR 🟫 🟫 Gemini‑2.0‑Flash 🟫
Grok DeepSearch 🟫 Grok 3 🟫
OpenAI DR 🟫 GPT‑o3 🟫
Agent‑R1 PPO · Reinforce++ · GRPO Qwen2.5‑1.5B‑In HotpotQA Rule‑Outcome
AutoGLM Rumination 🟫 🟫 GLM‑Z1‑Air 🟫
H2O.ai DR ✔️ 🟫 h2ogpt‑oasst1‑512‑12b 🟫
Copilot Researcher 🟫 🟫 o3‑mini
ReSearch GRPO Qwen2.5‑7B‑In / 32B‑In 2WikiMultiHopQA Rule‑Outcome
R1‑Searcher ✔️ Reinforce++ · GRPO Qwen2.5‑7B‑In / Llama3.1‑8B‑In 2WikiMultiHopQA · HotpotQA Rule‑Outcome
Search‑R1 ✔️ PPO · GRPO Qwen2.5‑3B/7B / Llama3.2‑3B‑In NQ · HotpotQA Rule‑Outcome
DeepResearcher GRPO Qwen2.5‑7B‑In NQ · HotpotQA Rule‑Outcome
Genspark Super Agent Mixture of Agents
WebThinker ✔️ Iterative Online DPO QwQ‑32B Expert Dataset Rule‑Outcome
SWIRL Offline‑RL Gemma 2‑27B HotpotQA
SimpleDeepSearcher ✔️ DPO · REINFORCE++ Qwen2.5‑7B/32B‑In · DeepSeek‑Distilled‑Qwen‑32B · QwQ‑32B NQ · HotpotQA · 2WikiMultiHopQA · Musique · SimpleQA · MultiHop‑RAG Process‑based reward
PANGU DEEPDIVER ✔️ GRPO Pangu‑7B‑Reasoner WebPuzzle Rule‑Outcome
WebDancer ✔️ GRPO QwQ-32B CrawlQA· E2HQA Rule‑Outcome

Benchmarks for DR Agents

📊 QA Benchmarks (Hotpot / 2Wiki / NQ / TQ / GPQA)
DR Agent Base Model Hotpot 2Wiki NQ TQ GPQA
Search‑o1 QwQ‑32B‑preview 57.3 71.4 49.7 74.1 57.9
Agentic Reasoning DeepSeek‑R1, Qwen2.5 67.0
Grok DeepSearch Grok 3 84.6
AgentRxiv GPT‑4o‑mini 41.0
R1‑Searcher Qwen2.5‑7B‑Base 71.9 63.8
ReSearch Qwen2.5‑7B‑Base 30.0 29.7
ReSearch Qwen2.5‑7B‑In 63.6 54.2
ReSearch Qwen2.5‑32B‑Base 64.3 45.6
ReSearch Qwen2.5‑32B‑In 67.7 50.0
Search‑R1 Llama3.2‑3B‑Base 30.0 29.7 43.1 61.2
Search‑R1 Llama3.2‑3B‑In 31.4 23.3 35.7 57.8
Search‑R1 Qwen2.5‑7B‑Base 28.3 27.3 39.6 58.2
Search‑R1 Qwen2.5‑7B‑In 34.5 36.9 40.9 55.2
DeepResearcher Qwen2.5‑7B‑In 64.3 66.6 61.9 85.0
Genspark Super Agent Mixture of Agents
WebThinker QwQ‑32B 68.7
SimpleDeepSearch Qwen2.5‑7B‑In 68.1
SimpleDeepSearch Qwen2.5‑32B‑In 70.5
SimpleDeepSearch DeepSeek‑R1‑Distill‑Qwen‑32B 68.1
SimpleDeepSearch QwQ‑32B 73.5
SWIRL Gemma 2‑27B 72.0
📊 GAIA (Test and Val) & HLE Benchmarks
DR Agent Base Model GAIA L-1 L-2 L-3 Ave. HLE Split
MMAC-Copilot GPT‑3.5, GPT‑4 45.16 20.75 6.12 25.91 Test
H2O.ai DR Claude‑3.7‑Sonnet 89.25 79.87 61.22 79.73 Test
Alita Claude‑Sonnet‑4, GPT‑4o 92.47 71.70 55.10 75.42 Test
AutoAgent Claude‑Sonnet‑3.5 71.70 53.50 26.90 55.20 Dev
OpenAI DR GPT‑o3‑custom 78.70 73.20 58.00 67.40 26.6 Dev
Perplexity DR Flexible 21.1 Dev
Manus Claude 3.5, GPT‑4o 86.50 70.10 57.70 71.4 Dev
OWL Claude‑3.7‑Sonnet 84.90 68.60 42.30 69.70 Dev
H2O.ai DR h2ogpt‑oasst1‑512‑12b 67.92 67.44 42.31 63.64 Dev
Genspark Super Agent Claude 3 Opus 87.8 72.7 58.8 73.1 Dev
WebThinker QwQ‑32B 53.8 44.2 16.7 44.7 13.0 Dev
WebDancer QwQ‑32B 61.5 50.0 25.0 51.5 - Dev
SimpleDeepSearch QwQ‑32B 50.5 45.8 13.8 43.9 Dev
Alita Claude‑Sonnet‑4, GPT‑4o 75.15 87.27 Dev

📄 Citation

If you find this work helpful, please cite our paper:

@misc{huang2025deepresearchagentssystematic,
      title={Deep Research Agents: A Systematic Examination And Roadmap}, 
      author={Yuxuan Huang and Yihang Chen and Haozheng Zhang and Kang Li and Meng Fang and Linyi Yang and Xiaoguang Li and Lifeng Shang and Songcen Xu and Jianye Hao and Kun Shao and Jun Wang},
      year={2025},
      eprint={2506.18096},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2506.18096}, 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5