Awesome Deep Research Agent

We maintain a curated collection of papers exploring the path towards Deep Research (DR) Agents, focusing on formulating core concepts and mapping the research landscape.

⌛️ Coming soon – Version 2! We’re continuously compiling and updating cutting‑edge insights. Feel free to suggest any related work you find valuable!

Build a digital assistant on your screen. Generated by DALL-E-3.

WELCOME CONTRIBUTE!

🔥 This project is actively maintained, and we welcome your contributions. If you have any suggestions, such as missing papers or information, please feel free to open an issue or submit a pull request.

Our Works Towards DR Agents

✨✨✨ Deep Research Agents: A Systematic Examination And Roadmap

Awesome Papers

Table of Contents

Search Engine Integration
Tool Use
Architecture & Workflow
Tuning Methods
Industrial Applications
Benchmarks for DR Agents

Search Engine Integration

📊 Search Engine · API vs Browser Comparison

Legend	✔️ Primary focus	🟫 Secondary/minor focus	— Not present

DR Agent	API	Browser	GAIA	HLE	QA	Base Model
Avatar	🟫	—	—	—	Stark	Claude‑3‑Opus, GPT‑4
CoSearch‑Agent	✔️	—	—	—	—	GPT‑3.5‑turbo
MMAC-Copilot	✔️	—	✔️	—	—	GPT‑3.5, GPT‑4
Storm	🟫	—	—	—	FreshWiki	GPT‑3.5‑turbo
OpenResearcher	✔️	—	—	—	Private QA	DeepSeek‑V2‑Chat
The AI Scientist	✔️	—	—	—	MLE‑Bench	GPT‑4o, o1‑mini, o1‑preview
Gemini DR	✔️	✔️	—	✔️	GPQA	Gemini‑2.0‑Flash
Agent Laboratory	✔️	—	—	—	MLE‑Bench	GPT‑4o, o1‑preview
Search‑o1	✔️	—	—	—	GPQA·NQ·TriviaQA	QwQ‑32B‑preview
Agentic Reasoning	✔️	—	—	—	GPQA	DeepSeek‑R1, Qwen2.5
AutoAgent	—	✔️	✔️	—	—	Claude‑Sonnet‑3.5
Grok DeepSearch	✔️	✔️	—	—	GPQA	Grok 3
OpenAI DR	—	✔️	✔️	✔️	✔️	GPT‑o3
Perplexity DR	✔️	🟫	—	✔️	SimoleQA	Flexible
AgentRxiv	✔️	—	—	—	GPQA·MedQA	GPT‑4o‑mini
Agent‑R1	✔️	—	—	—	HotpotQA	Qwen2.5‑1.5B‑Inst
AutoGLM Rumination	—	✔️	—	—	GPQA	GLM‑Z1‑Air
Copilot Researcher	—	✔️	—	—	—	o3‑mini
H2O.ai DR	✔️	✔️	✔️	—	—	h2ogpt‑oasst1‑512‑12b
Manus	✔️	✔️	—	—	—	Claude3.5, GPT‑4o
OpenManus	✔️	✔️	—	—	—	Claude3.5, GPT‑4o
OWL	✔️	✔️	✔️	—	—	DeepSeek‑R1, Gemini‑2.5‑Pro, GPT‑4o
R1‑Searcher	🟫	—	—	—	2WikiMultiHopQA, HotpotQA	Llama3.1‑8B‑Inst, Qwen2.5‑7B
ReSearch	🟫	—	—	—	2WikiMultiHopQA, HotpotQA	Qwen2.5‑7B, Qwen2.5‑7B‑Inst
Search‑R1	🟫	—	—	—	2WikiMultiHopQA, HotpotQA, NQ, TriviaQA	Llama3.2‑3B, Qwen2.5‑3B/7B
DeepResearcher	—	✔️	—	—	HotpotQA, NQ, TriviaQA	Qwen2.5‑7B‑Inst
Genspark Super Agent	✔️	✔️	✔️	—	—	Mixture of 9 LLMs
WebThinker	✔️	—	✔️	✔️	GPQA, WebWalkerQA	QwQ‑32B
SWIRL	—	✔️	—	—	HotQA, BeerQA	Gemma 2‑27B
SimpleDeepSearcher	—	✔️	✔️	—	2WikiMultiHopQA	Qwen2.5‑7B/32B‑In, DeepSeek‑D‑Qwen2.5‑32B, QwQ‑32B
Suna AI	✔️	✔️	—	—	—	GPT‑4o, Claude
AgenticSeek	—	✔️	—	—	—	GPT‑4o, DeepSeek‑R1, Claude
Alita	✔️	✔️	✔️	—	PathVQA	GPT‑4o, Claude‑Sonnet‑4
DeerFlow	✔️	—	—	—	—	Doubao‑1.5‑Pro‑32k, DeepSeek‑R1, GPT‑4o, Qwen
PANGU DEEPDIVER	✔️	—	—	—	C-SimpleQA, HotpotQA, ProxyQA	Pangu‑7B‑Reasoner
WebDancer	✔️	✔️	✔️	—	GAIA, WebWalkerQA	WebDancer‑QwQ-32B

Tool Use

📊 Tool Use Capabilities Comparison

Legend	✔️ Involved	🟫 Non Disclosure	— Not present

DR Agent	Code Interp.	Data Analytics	Multimodal	MCP
CoSearchAgent	—	✔️	—	—
Storm	✔️	—	—	—
The AI Scientist	✔️	—	—	—
Agent Laboratory	✔️	—	—	—
Agentic Reasoning	✔️	—	—	—
AutoAgent	✔️	—	✔️	—
Genspark DR	✔️	✔️	✔️	✔️
Grok DeepSearch	✔️	✔️	✔️	🟫
OpenAI DR	✔️	✔️	✔️	✔️
Perplexity DR	✔️	✔️	✔️	🟫
Agent‑R1	✔️	—	—	—
AutoGLM Rumination	✔️	—	✔️	🟫
Copilot Researcher	✔️	✔️	🟫	🟫
Manus	✔️	✔️	✔️	🟫
OpenManus	✔️	✔️	—	✔️
OWL	✔️	✔️	✔️	✔️
H2O.ai DR	✔️	✔️	✔️	🟫
Genspark Super Agent	✔️	✔️	✔️	✔️
WebThinker	✔️	—	—	—
Suna AI	✔️	✔️	—	—
AgenticSeek	✔️	✔️	—	—
Alita	✔️	🟫	🟫	—
DeerFlow	✔️	✔️	—	—

Architecture & Workflow

Static Workflow

Dynamic Single‑Agent Workflow

Dynamic Multi‑Agent Workflow

Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models
AutoAgent: A Fully-Automated and Zero-Code Framework for LLM Agents
OpenManus
Manus
OWL
PANGU DEEPDIVER: ADAPTIVE SEARCH INTENSITY SCALING VIA OPEN-WEB REINFORCEMENT LEARNING
H2O.ai Deep Research
Copilot Researche
Genspark Super Agent
DeerFlow
MMAC-Copilot

Tuning Methods

📊 Tuning Methods Comparison

Legend	✔️ Implemented	🟫 Details Unknown	— Not present

DR Agent	SFT	RL	Base Model	Data	Reward Design
Gemini DR	🟫	🟫	Gemini‑2.0‑Flash	—	🟫
Grok DeepSearch	—	🟫	Grok 3	—	🟫
OpenAI DR	—	🟫	GPT‑o3	—	🟫
Agent‑R1	—	PPO · Reinforce++ · GRPO	Qwen2.5‑1.5B‑In	HotpotQA	Rule‑Outcome
AutoGLM Rumination	🟫	🟫	GLM‑Z1‑Air	—	🟫
H2O.ai DR	✔️	🟫	h2ogpt‑oasst1‑512‑12b	—	🟫
Copilot Researcher	🟫	🟫	o3‑mini	—	—
ReSearch	—	GRPO	Qwen2.5‑7B‑In / 32B‑In	2WikiMultiHopQA	Rule‑Outcome
R1‑Searcher	✔️	Reinforce++ · GRPO	Qwen2.5‑7B‑In / Llama3.1‑8B‑In	2WikiMultiHopQA · HotpotQA	Rule‑Outcome
Search‑R1	✔️	PPO · GRPO	Qwen2.5‑3B/7B / Llama3.2‑3B‑In	NQ · HotpotQA	Rule‑Outcome
DeepResearcher	—	GRPO	Qwen2.5‑7B‑In	NQ · HotpotQA	Rule‑Outcome
Genspark Super Agent	—	—	Mixture of Agents	—	—
WebThinker	✔️	Iterative Online DPO	QwQ‑32B	Expert Dataset	Rule‑Outcome
SWIRL	—	Offline‑RL	Gemma 2‑27B	HotpotQA	—
SimpleDeepSearcher	✔️	DPO · REINFORCE++	Qwen2.5‑7B/32B‑In · DeepSeek‑Distilled‑Qwen‑32B · QwQ‑32B	NQ · HotpotQA · 2WikiMultiHopQA · Musique · SimpleQA · MultiHop‑RAG	Process‑based reward
PANGU DEEPDIVER	✔️	GRPO	Pangu‑7B‑Reasoner	WebPuzzle	Rule‑Outcome
WebDancer	✔️	GRPO	QwQ-32B	CrawlQA· E2HQA	Rule‑Outcome

Benchmarks for DR Agents

📊 QA Benchmarks (Hotpot / 2Wiki / NQ / TQ / GPQA)

DR Agent	Base Model	Hotpot	2Wiki	NQ	TQ	GPQA
Search‑o1	QwQ‑32B‑preview	57.3	71.4	49.7	74.1	57.9
Agentic Reasoning	DeepSeek‑R1, Qwen2.5	—	—	—	—	67.0
Grok DeepSearch	Grok 3	—	—	—	—	84.6
AgentRxiv	GPT‑4o‑mini	—	—	—	—	41.0
R1‑Searcher	Qwen2.5‑7B‑Base	71.9	63.8	—	—	—
ReSearch	Qwen2.5‑7B‑Base	30.0	29.7	—	—	—
ReSearch	Qwen2.5‑7B‑In	63.6	54.2	—	—	—
ReSearch	Qwen2.5‑32B‑Base	64.3	45.6	—	—	—
ReSearch	Qwen2.5‑32B‑In	67.7	50.0	—	—	—
Search‑R1	Llama3.2‑3B‑Base	30.0	29.7	43.1	61.2	—
Search‑R1	Llama3.2‑3B‑In	31.4	23.3	35.7	57.8	—
Search‑R1	Qwen2.5‑7B‑Base	28.3	27.3	39.6	58.2	—
Search‑R1	Qwen2.5‑7B‑In	34.5	36.9	40.9	55.2	—
DeepResearcher	Qwen2.5‑7B‑In	64.3	66.6	61.9	85.0	—
Genspark Super Agent	Mixture of Agents	—	—	—	—	—
WebThinker	QwQ‑32B	—	—	—	—	68.7
SimpleDeepSearch	Qwen2.5‑7B‑In	—	68.1	—	—	—
SimpleDeepSearch	Qwen2.5‑32B‑In	70.5	—	—	—	—
SimpleDeepSearch	DeepSeek‑R1‑Distill‑Qwen‑32B	68.1	—	—	—	—
SimpleDeepSearch	QwQ‑32B	73.5	—	—	—	—
SWIRL	Gemma 2‑27B	72.0	—	—	—	—

📊 GAIA (Test and Val) & HLE Benchmarks

DR Agent	Base Model	GAIA L-1	L-2	L-3	Ave.	HLE	Split
MMAC-Copilot	GPT‑3.5, GPT‑4	45.16	20.75	6.12	25.91	—	Test
H2O.ai DR	Claude‑3.7‑Sonnet	89.25	79.87	61.22	79.73	—	Test
Alita	Claude‑Sonnet‑4, GPT‑4o	92.47	71.70	55.10	75.42	—	Test
AutoAgent	Claude‑Sonnet‑3.5	71.70	53.50	26.90	55.20	—	Dev
OpenAI DR	GPT‑o3‑custom	78.70	73.20	58.00	67.40	26.6	Dev
Perplexity DR	Flexible	—	—	—	—	21.1	Dev
Manus	Claude 3.5, GPT‑4o	86.50	70.10	57.70	71.4	—	Dev
OWL	Claude‑3.7‑Sonnet	84.90	68.60	42.30	69.70	—	Dev
H2O.ai DR	h2ogpt‑oasst1‑512‑12b	67.92	67.44	42.31	63.64	—	Dev
Genspark Super Agent	Claude 3 Opus	87.8	72.7	58.8	73.1	—	Dev
WebThinker	QwQ‑32B	53.8	44.2	16.7	44.7	13.0	Dev
WebDancer	QwQ‑32B	61.5	50.0	25.0	51.5	-	Dev
SimpleDeepSearch	QwQ‑32B	50.5	45.8	13.8	43.9	—	Dev
Alita	Claude‑Sonnet‑4, GPT‑4o	75.15	—	87.27	—	—	Dev

📄 Citation

If you find this work helpful, please cite our paper:

@misc{huang2025deepresearchagentssystematic,
      title={Deep Research Agents: A Systematic Examination And Roadmap}, 
      author={Yuxuan Huang and Yihang Chen and Haozheng Zhang and Kang Li and Meng Fang and Linyi Yang and Xiaoguang Li and Lifeng Shang and Songcen Xu and Jianye Hao and Kun Shao and Jun Wang},
      year={2025},
      eprint={2506.18096},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2506.18096}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
assets		assets
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Awesome Deep Research Agent

Our Works Towards DR Agents

Awesome Papers

Search Engine Integration

Tool Use