Transcription and Recognition of Italian Parliamentary Speeches Using Vision-Language Models

arXiv:2603.28103v2 Announce Type: replace-cross Abstract: Parliamentary proceedings represent a rich yet challenging resource for computational analysis, particularly when preserved only as scanned historical documents. Existing efforts to transcribe Italian parliamentary speeches have relied on traditional Optical Character Recognition pipelines, resulting in transcription errors and limited semantic annotation. In this paper, we propose a pipeline based on Vision-Language Models for the automatic transcription, semantic segmentation, and entity linking of Italian parliamentary speeches. The
Advances in Vision-Language Models are enabling more sophisticated and automated analysis of complex historical data sets, making such applications feasible and efficient now.
This development allows for enhanced computational analysis of historical parliamentary records, creating new opportunities for insights into governance, policy, and societal evolution, potentially impacting future AI applications in public administration.
The ability to accurately transcribe, segment, and link entities within historical parliamentary speeches shifts from manual or traditional OCR methods to more robust, AI-driven processes, improving data quality and accessibility.
- · Historians
- · Political scientists
- · AI researchers in NLP/VLM
- · Government archives
- · Traditional OCR providers
More accurate and semantically rich digital archives of historical parliamentary speeches become available for research.
New computational methods emerge for analyzing political discourse, rhetoric, and policy evolution over long periods.
The application of VLM for governmental data processing could expand, driving demand for sovereign AI solutions for sensitive national data.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI