GeoDial: A Multimodal Conversational Tutoring Dataset for Geometry Problem-Solving with Visual Tutor Turns

arXiv:2606.12419v1 Announce Type: cross Abstract: Several educational domains rely heavily on diagrams and visual cues, yet most existing tutoring datasets are limited to text-only interactions. This limits the development of AI tutors that can teach in visually grounded ways used by human instructors. Thus, we introduce GeoDial, a multimodal tutoring dataset of over 1.3K teacher-student dialogs in the domain of geometry collected from experienced math teachers, where instructional turns are explicitly grounded in diagram highlights. We propose a scalable annotation protocol that integrates di
The increasing sophistication of multimodal AI models and the critical need for more effective, visually grounded AI tutors are driving this development.
This dataset provides a crucial foundation for developing AI tutors that can interact visually, bridging a significant gap between current text-only systems and human-like instruction in complex visual domains.
The capability of AI tutors to understand and respond to visual cues in educational settings, particularly in subjects like geometry, will be significantly enhanced, leading to more effective personalized learning experiences.
- · AI education platforms
- · Multimodal AI developers
- · Students in STEM
- · Tutoring services
- · Traditional text-only AI tutoring systems
AI tutors become more effective and personalized by integrating visual context into their teaching methodology.
This improved tutoring capability could lead to higher educational attainment and engagement in visual subjects.
The methodology could be adapted to other visual learning domains, accelerating AI's role in diverse educational fields beyond geometry.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI