
arXiv:2607.00049v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly used in Agile Software Development for documentation, coaching, and training. As practitioners adopt these tools to prepare for certifications such as Professional Scrum Master (PSM), a key question is whether LLMs can reliably reason about Scrum, a framework with normative, well-defined rules described in the Scrum Guide (2020). This paper examines how different prompt techniques affect the factual accuracy of LLM responses to Scrum certification-style questions. A dataset of 993 validated PSM-alig
The proliferation of advanced LLMs like GPT-5 and their increasing integration into professional workflows necessitates an urgent empirical understanding of their reliability in specialized, normative domains.
This study provides critical data on the factual accuracy of LLMs in specific knowledge applications, directly influencing their adoption in professional training, certification, and automated decision-making.
The perceived trustworthiness and utility of LLMs for specialized knowledge tasks, particularly in fields requiring precise adherence to established rules and standards, will be refined by such empirical results.
- · LLM developers who can demonstrate high factual accuracy
- · Agile coaches and trainers leveraging LLMs effectively
- · Professionals seeking efficient certification preparation
- · LLM developers with inaccurate models
- · Certification bodies unprepared for LLM-assisted testing
Increased scrutiny and demand for factual accuracy metrics in enterprise LLM applications, especially in regulated industries.
Development of specialized LLMs or fine-tuning techniques optimized for specific professional knowledge domains to achieve certification-level accuracy.
Potential for LLMs to become integral, yet critically evaluated, components of professional education and continuous learning, shifting traditional pedagogical approaches.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI