REDSearcher Logo

REDSearcher: A Scalable and Cost-Efficient
Framework for Long-Horizon Search Agents

1Harbin Institute of Technology, 2Xiaohongshu, 3Shanghai Jiao Tong University
*Equal Contribution Project Leader Corresponding Author

🛠 Scalable task synthesis via graph-structured reasoning with topological complexity control

🚀 Cost-efficient training via mid-training of core search-agent subskills

🏆 SOTA performance across both text-only and multimodal benchmarks

REDSearcher Results

Our Pipeline

REDSearcher Pipeline

Figure 2. QA Generation. We construct directed acyclic graphs (DAGs) from Knowledge-Graph entities and Web-Walk hyperlinks, enabling explicit difficulty control via topological complexity. Each node is enriched with multi-source evidence, then sampled to form reasoning paths with Query Fuzzing (entity/attribute anonymization) to increase search challenge. Verifier Pipeline. A cascaded filter progressively validates quality: LLM difficulty check, QA-graph alignment, Google retrieval verification, hallucination detection, agent rollout confirmation, and answer uniqueness validation—producing 20K+ verified trajectories at >85% accuracy.

Performance

Comparison between REDSearcher and closed / open agentic models

Model BrowseComp BrowseComp-zh GAIA HLE Overall
Proprietary Deep Research Agents
Seed–1.867.681.387.440.969.3
Gemini–2.5–pro–DR7.627.3---
Gemini–3–Pro37.851.674.845.852.5
Claude–4.5–sonnet24.142.466.032.041.1
OpenAI–o349.758.170.520.249.6
GPT–5–Thinking–high54.963.076.741.759.1
GPT–5.2–Thinking–xhigh65.876.1---
Open-source Deep Research Agents
Kimi–K2.5–Agent60.6 / 74.9*--50.2-
GLM–4.752.0 / 66.6*- / 67.5*-42.8-
DeepSeek–V3.251.4 / 67.6*- / 65.0*-40.8-
LongCat–Flash–Thinking56.6 / 73.1*69.0 / 77.7*---
Open-source 30B–A3B Agents
WebResearcher–30B37.345.2-28.8-
WebSailorV2–30B35.344.174.130.646.0
Tongyi DeepResearch–30B43.446.770.932.948.5
GLM–4.7–Flash42.8----
REDSearcher42.1 / 57.4*49.8 / 58.2*80.133.351.3

* Results with Context Management (CM). Best results in bold.

Main results on multimodal search benchmarks

Model MM-BC BC-VL MMS+ MMS LiveVQA HLE-T HLE-VL BC BC-ZH
Proprietary Deep Research Agents
Gemini-2.5-Flash5.644.619.964.073.0----
Gemini-2.5-Pro7.149.922.269.076.0--7.627.3
Seed1.846.3----40.931.567.681.3
Seed1.821.454.111.069.762.4----
GPT-5-46.117.263.773.341.7-54.963.0
Gemini-3-Pro28.556.438.173.079.945.8*36.0*37.8*51.6*
Multimodal Agent Flow
Qwen2.5-VL1.810.2-29.235.7-4.9--
Qwen3-VL (30B)10.737.111.059.764.88.88.70.27.2
Qwen3-VL (235B)12.143.117.463.370.214.514.10.318.6
Multimodal DeepResearch Agent
MMSearch-R1---53.848.4----
WebWatcher-27.0-55.358.7-13.6--
DeepEyesV2---63.7-----
Vision-DeepResearch-53.728.569.677.6----
REDSearcher-MM-SFT25.355.320.270.378.524.424.230.143.1
REDSearcher-MM-RL23.557.226.672.979.325.325.631.244.5

† denotes results evaluated using the same evaluation tools as ours, and * denotes results taken from the original papers.

BibTeX

@article{redsearcher2026,
  title={REDSearcher: A Scalable and Cost-Efficient Framework for Long-Horizon Search Agents},
  author={Zheng Chu and Xiao Wang and Jack Hong and Huiming Fan and Yuqi Huang and Yue Yang and Guohai Xu and Shengchao Hu and Dongdong Kuang and Chenxiao Zhao and Cheng Xiang and Ming Liu and Bing Qin and Xing Yu},
  journal={arXiv preprint arXiv:2602.14234},
  url={https://arxiv.org/pdf/2602.14234},
  year={2026}
}