Knowledge Distillation from Large Reasoning Models to Compact Student Models: A Case Study on the John O Bryan Mathematics Competition

arXiv:2606.31048v1 Announce Type: new Abstract: This paper investigates knowledge distillation from a large reasoning model (DeepSeek-R1) to a compact student model (Qwen2.5-7B). Using historical problems from the John O'Bryan Mathematics Competition at Northern Kentucky University (2011-2025), we build a Chain-of-Thought (CoT) training corpus through a dual-agent framework. The dataset is used to fine-tune the student model with Low-Rank Adaptation (LoRA) on Apple Silicon hardware using the MLX framework. The base Qwen2.5-7B model achieves 64.67% accuracy on competition problems, while the De
The research demonstrates a practical methodology for distilling knowledge from large, expensive reasoning models into more compact, efficient student models, accelerating the deployment of specialized AI. This is critical given the increasing scale of foundation models and the need for edge deployment.
This development represents a significant step towards democratizing advanced AI capabilities, allowing smaller entities or specialized applications to leverage powerful reasoning with reduced computational overhead. It directly addresses the cost and resource intensity associated with large AI models.
The ability to effectively distill complex reasoning means that high-performance AI can be deployed on more accessible hardware, broadening the scope and accessibility of advanced AI applications. Barriers to entry for developing and utilizing sophisticated AI are lowered.
- · AI hardware manufacturers (e.g., Apple Silicon)
- · Developers of specialized AI applications
- · Smaller AI research labs
- · Educational institutions adopting AI tools
- · Companies solely reliant on massive, monolithic AI models
- · Cloud providers if compute shifts to edge appliances
- · Entities with limited access to enterprise-scale compute
More efficient and specialized AI models become widely accessible for specific tasks.
This leads to an acceleration in the development and deployment of agentic systems capable of complex reasoning on consumer-grade hardware.
The proliferation of such compact, powerful AI could enable new forms of personal AI assistants or embedded intelligence that operate with greater autonomy and less reliance on centralized compute infrastructure, contributing to the 'ai-agents' narrative by distributing intelligent capabilities.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG