
arXiv:2605.28897v1 Announce Type: new Abstract: LLM-generated reviews for scientific papers are gaining considerable traction and are even being officially piloted by major conferences. We have to assume that not only reviewers are using LLM-assistance, but also that authors use LLMs to revise their papers before submitting. In this work, we perform empirical experiments on papers from the 2025 ACL Rolling Review (ARR) to evaluate LLM reviews from both the author and the reviewer perspective. First, we identify a limited alignment of LLM reviews with human ones. In the best-case scenario, the
The increased adoption and piloting of LLM-generated reviews by major conferences, coupled with authors using LLMs for paper revision, makes the evaluation of LLM review quality critically timely.
This research provides early empirical evidence regarding the alignment and 'gameability' of LLM-generated reviews, directly impacting the integrity and efficiency of academic peer review processes.
The understanding that LLM reviews have limited alignment with human reviews, and the potential for adversarial gaming, necessitates new strategies for integrating AI into academic workflows.
- · Researchers developing AI alignment techniques
- · Platforms providing human oversight for AI-generated content
- · Academic journals seeking improved peer review processes
- · Developers of unaligned LLM review systems
- · Conferences deploying unchecked AI review systems
- · Authors relying solely on LLMs for paper polishing without human oversight
Scientific conferences and journals will accelerate efforts to develop robust methods for LLM integration into peer review, focusing on alignment and robustness against manipulation.
There will be a rise in specialized AI tools designed to detect LLM-generated text attempting to 'game' review systems, fostering an 'AI vs. AI' dynamic in academic publishing.
The perceived integrity of AI-assisted academic publishing could wane if 'gameability' issues are not addressed, potentially leading to a backlash against broad LLM adoption in critical evaluation processes.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI