Automated grading of Linux/bash examinations using large language models: a four-level cognitive taxonomy approach

arXiv:2607.02432v1 Announce Type: cross Abstract: Scalable and reliable grading of command-line examinations remains a challenge in computing education, where rising enrolments make manual marking difficult and rule-based autograders cannot handle partial credit, equivalent solutions, or syntactic variation. This paper evaluates whether four frontier Large Language Models (GPT, Claude Opus, Gemini, and GLM) can approximate expert judgment when grading short Linux/bash command responses. The study adopts a four-level cognitive taxonomy that combines cognitive complexity and operational impact,
The proliferation of advanced large language models (LLMs) and the increasing enrollment in computing education programs create a demand for scalable and reliable automated grading solutions.
This development suggests a significant leap in the practical applications of AI, potentially automating complex cognitive tasks previously requiring human expert judgment and enabling educational scalability.
Traditional manual and rule-based grading methods for nuanced technical assignments may be replaced or augmented by AI, especially for partial credit, equivalent solutions, and syntactic variations.
- · Educational institutions
- · AI developers (LLM providers)
- · Students (faster feedback)
- · Traditional autograding software (rule-based)
- · Human graders for routine tasks
Automated grading becomes more accurate and flexible, handling complex assignments with human-like judgment.
The cost and time required for technical education grading decrease, potentially leading to increased course offerings and enrollments.
AI grading systems could evolve to provide personalized real-time feedback and tutoring, fundamentally altering pedagogical approaches in technical fields.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL