SIGNALAI·Jul 2, 2026, 4:00 AMSignal75Short term

Comparing Large Language Models on Scrum Certification-Style Questions: Accuracy, Stability, and Error Patterns

arXiv:2607.00048v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly used in exam- and certification-style question answering tasks, where their ability to retrieve, interpret, and apply domain-specific knowledge can be systematically assessed. In Software Engineering, such settings are particularly relevant when questions depend on strict adherence to normative definitions, roles, artifacts, and rules. This paper evaluates the performance of three contemporary LLMs, \textit{GPT-5 mini}, \textit{Gemini 3 Flash}, and \textit{DeepSeek Chat 3.2}, in answering 993 Scrum

Why this matters

Why now

The proliferation of advanced LLMs necessitates systematic evaluation of their domain-specific capabilities, particularly for certification and regulated fields.

Why it’s important

The ability of LLMs to perform well on precise, normative domain questions indicates their readiness for professional applications, impacting white-collar workflows.

What changes

The explicit benchmarking of LLMs against rigorous professional certification standards provides clearer guidance on their deployment in fields requiring strict adherence to defined practices.

Winners

· AI developers
· Certification bodies adopting AI tools
· Software engineering consultancies

Losers

· Traditional human-only certification processes
· Educational institutions focused solely on rote learning
· Outdated assessment methodologies

Second-order effects

Direct

LLMs demonstrate increasingly sophisticated understanding of complex domain-specific knowledge required for professional certifications.

Second

This performance drives faster adoption of AI tools within professional services and education, leading to efficiency gains and workforce displacement.

Third

The definition of 'certified professional' evolves to include AI-assisted roles, potentially lowering barriers to entry in some fields but creating new demands for AI literacy.

Editorial confidence: 95 / 100 · Structural impact: 60 / 100

Original report

This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.

Read at arXiv cs.AI

#cs.SE #cs.AI

Tracked by The Continuum Brief · live intelligence network

The Brief · Weekly Dispatch

Stay ahead of the systems reshaping markets.