When the Tool Decides: LLM Agents Defer Blindly to Graph Neural Network Tools, and Stronger Backbones Defer More

arXiv:2606.14476v1 Announce Type: new Abstract: A growing line of work equips large language model (LLM) agents with graph neural networks (GNNs) as callable tools, assuming the agent exercises judgment over when and how much to rely on such a tool. We test this directly. We expose a frozen GNN to a ReAct-style LLM agent as an explicit tool and measure, on node classification over a text-attributed graph (ogbn-arxiv, replicated on WikiCS), whether the agent uses the tool or merely obeys it. We find the agent does not exercise judgment: its predictions agree with the raw GNN's 97.6-99.2% of the
The rapid advancement and integration of LLM agents with specialized tools like GNNs necessitates inquiry into their operational autonomy and reliability.
This research reveals a critical vulnerability in current LLM agent designs, where blind deference to tools can bypass independent judgment, potentially leading to unreliable or easily manipulable autonomous systems.
The assumption that LLM agents 'exercise judgment' over tool use is directly challenged, requiring a re-evaluation of agent architecture and safety protocols for integrating specialized AI tools.
- · AI Safety Researchers
- · Developers of robust LLM agent architectures
- · Organizations prioritizing autonomous system reliability
- · Developers of un-audited LLM agent integrations
- · Applications reliant on unverified agent judgment
- · Organizations deploying agents without critical oversight
Immediate emphasis will be placed on developing mechanisms for LLM agents to critically evaluate tool outputs rather than blindly accept them.
This finding could spur the creation of 'meta-agents' or oversight modules designed to monitor and challenge the decisions of primary LLM agents interacting with tools.
It might lead to new paradigms in human-AI collaboration where human operators intervene not just to provide instructions but to validate agentic tool-use decisions.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI