Rethinking Literature Search Evaluation: Deep Research Helps, and Human Citation Lists Are Not a Ground Truth

arXiv:2605.29234v1 Announce Type: new Abstract: We study large-scale literature search from two complementary angles: improving the retrieval pipeline, and stress-testing the human reference list as an evaluation target. First, we implement a Deep Research pipeline that processes the full query paper and expands the retrieved results breadth-first along their bibliographies, and show that it substantially outperforms vanilla API-only search, raising recall on RollingEval-Jun25 (a 250-paper literature-search benchmark) from below 20% to above 80%. Second, we use a neutral LLM-as-a-judge to dete
The rapid advancement of large language models and the increasing sophistication of AI research tools enable deeper, automated literature search, challenging traditional evaluation methods.
This development significantly enhances the efficiency and accuracy of scientific discovery and technological innovation by improving how researchers access and synthesize existing knowledge.
The methodology for evaluating literature search effectiveness shifts from relying solely on human-curated citation lists to incorporating AI-driven, deep research pipelines for broader recall.
- · AI-powered research platforms
- · Academics and researchers
- · Scientific discovery
- · Information retrieval developers
- · Traditional keyword-based search engines
- · Manual literature review processes
- · Evaluations solely based on human citations
Researchers gain access to a much wider and more relevant body of literature, accelerating their work.
The pace of innovation and scientific breakthroughs could increase across many fields due to improved knowledge synthesis.
This could lead to a restructuring of academic publication and evaluation standards, moving towards AI-assisted validation of research novelty.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI