Deterministic Inference across Tensor Parallel Sizes That Eliminates Training-Inference Mismatch

arXiv:2511.17826v2 Announce Type: replace Abstract: Deterministic inference is increasingly critical for large language model (LLM) applications such as LLM-as-a-judge evaluation, multi-agent systems, and Reinforcement Learning (RL). However, existing LLM serving frameworks exhibit non-deterministic behavior: identical inputs can yield different outputs when system configurations (e.g., tensor parallel (TP) size, batch size) vary, even under greedy decoding. This arises from the non-associativity of floating-point arithmetic and inconsistent reduction orders across GPUs. While prior work has a
The increasing complexity and mission-criticality of large language model (LLM) applications demand higher reliability, making deterministic inference a pressing concern.
Non-deterministic LLM behavior undermines the reliability and trustworthiness of AI systems, crucial for sensitive applications like multi-agent systems and safety-critical evaluations.
Achieving deterministic inference will enable more consistent and auditable LLM operations, allowing for more robust development and deployment of advanced AI applications.
- · LLM developers
- · AI safety researchers
- · Multi-agent system providers
- · Cloud infrastructure providers
- · Platforms with non-deterministic LLM serving frameworks
Increased trust and adoption of LLM-powered applications in critical sectors due to enhanced reliability.
Faster development and debugging cycles for complex AI systems as behavior becomes predictable and reproducible.
The potential for new regulatory standards and certifications for deterministic AI systems, impacting market entry and compliance.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG