A Benchmark of State-Space Models vs. Transformers and BiLSTM-based Models for Historical Newspaper OCR

arXiv:2604.00725v2 Announce Type: replace-cross Abstract: End-to-end OCR for historical newspapers remains challenging, as models must handle long text sequences, degraded print quality, and complex layouts. While Transformer-based recognizers dominate current research, their quadratic complexity limits efficient paragraph-level transcription and large-scale deployment. We investigate linear-time State-Space Models (SSMs), specifically Mamba, as a scalable alternative to Transformer-based sequence modeling for OCR. We present to our knowledge, the first OCR architecture based on SSMs, combinin
The proliferation of digital archives and the limitations of current Transformer-based OCR models for historical documents are creating a demand for more efficient and scalable solutions.
This development could significantly improve the accessibility and analysis of vast amounts of historical data, impacting fields from humanities research to AI training data.
The adoption of State-Space Models (SSMs) like Mamba introduces a new paradigm for sequence modeling in OCR, potentially replacing the computationally intensive Transformer architecture for certain applications.
- · AI researchers (SSMs)
- · Archivists & Historians
- · Digital Humanities
- · Data Infrastructure Providers
- · Legacy OCR software vendors
- · Transformer-centric AI research (for specific tasks)
State-Space Models (SSMs) gain traction as an efficient alternative to Transformers for long sequence processing in AI.
Improved OCR for historical documents leads to new insights and applications across various domains, accelerating data digitization efforts.
The reduced computational cost of SSMs contributes to more distributed and energy-efficient AI models, impacting the compute supply chain and energy footprint of AI.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG