
arXiv:2607.00970v1 Announce Type: new Abstract: This paper introduces Svarna, a free, open-source, web-based corpus workbench for modern Greek. Svarna integrates five databases covering various registers, institutional, literary, dialectal, social media, and historical, to provide a total of more than 507 million words and around 29 million sentences. This platform addresses the chronic gaps in Greek language technology. Although various corpus resources exist, they are scattered across different platforms, and in many cases, institutional access is restricted or they are no longer available o
The proliferation of AI models is driving urgent demand for high-quality, open-source linguistic data tailored to specific languages, making projects like Svarna critical for advancing fair and inclusive AI development.
This development allows Greece to improve its domestic AI capabilities without reliance on foreign data providers, fostering linguistic sovereignty and potentially stimulating a local AI industry.
Greece now has a consolidated, open-access, and robust corpus workbench for Modern Greek, addressing long-standing gaps in its language technology infrastructure and enabling more sophisticated AI applications.
- · Greece's AI research community
- · Modern Greek language learners
- · Greek technology companies
- · Linguistic diversity advocates
- · Proprietary Greek language data providers
Improved performance of AI models trained on Modern Greek data, leading to better language understanding and generation for the language.
Increased investment and innovation in Greek-specific AI applications, from education to public services, as data barriers are lowered.
The development of a sovereign AI strategy for Greece, leveraging its unique linguistic assets to build domestic technological independence and reduce reliance on foreign-developed AI stacks.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL