IHDec: Divergence-Steered Contrastive Decoding for Securing Multi-Turn Instruction Hierarchies

arXiv:2606.29960v1 Announce Type: new Abstract: Large Language Models (LLMs) often fail to maintain instruction hierarchies (IH) when processing multi-source inputs with varying role-level priorities, paradoxically adhering to lower-priority directives during conflicts. While existing defenses mitigate this issue, they are largely restricted to single-turn scenarios and require expensive fine-tuning. In this paper, we formalize this failure mode in multi-turn contexts via a Jensen-Shannon Divergence (JSD) framework, uncovering a pervasive role-influence inversion phenomenon where subordinate i
The rapid deployment and increasing complexity of LLMs in multi-turn interactive scenarios necessitates addressing fundamental reliability and control issues that are only now becoming apparent at scale.
This research highlights a critical vulnerability in Large Language Models (LLMs) regarding instruction adherence in complex multi-turn interactions, directly impacting their trustworthiness and applicability in crucial applications.
The ability to secure instruction hierarchies in multi-turn interactions becomes a new frontier for LLM development, moving beyond single-turn defenses and opening pathways for more reliable AI agents.
- · AI developers focused on robust agentic systems
- · Organizations deploying LLMs in critical, multi-step workflows
- · Researchers specializing in interpretability and safety for LLMs
- · LLM providers with insecure instruction processing architectures
- · Applications relying on simple prompt engineering for complex hierarchical tasks
- · Users experiencing unpredictable AI behavior due to 'role-influence inversion'
Improved methods for securing LLM instruction adherence will lead to more reliable AI-driven automation.
Enhanced trustworthiness in AI systems could accelerate the adoption of autonomous AI agents in sensitive industries.
The formalization of 'role-influence inversion' might inspire new regulatory frameworks or testing standards for AI system safety and control.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL