Toward Secure and Reliable PDDL Formalization of Large Language Models with Planner-in-the-Loop Feedback

arXiv:2606.29700v1 Announce Type: new Abstract: Planning often requires symbolic specifications that are both executable and verifiable. For large language models deployed in autonomous or decision-support systems, failures in such formalization may lead to unverifiable decisions, execution failures, or unsafe downstream behavior. We present NL-PDDL-Bench, a multi-domain benchmark for natural-language-to-PDDL specification construction with planner-verified executability and controlled difficulty scaling by object count. We further propose a planner-in-the-loop framework that uses validator an
The increasing deployment of large language models in autonomous systems necessitates robust, verifiable formalizations to ensure safety and reliability.
This work addresses critical challenges in formalizing LLM behavior, which is essential for safely integrating AI into sensitive and decision-support systems.
The introduction of a benchmark and planner-in-the-loop framework provides tools for developing more secure and reliable PDDL specifications for LLMs.
- · AI Safety Researchers
- · Autonomous System Developers
- · High-Reliability AI Sectors
- · Developers of Unverifiable AI Systems
- · AI Systems Prone to Unpredictable Failures
Improved methods for formalizing and verifying LLM behavior, leading to safer AI applications.
Accelerated adoption of LLMs in critical infrastructure and high-stakes decision-making environments.
New regulatory frameworks and certification processes built around verifiable AI formalizations.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI