The Digital Afterlife of Empires: Four Language Models Converge on the Same Imperial Cartography of Writing

arXiv:2606.28325v1 Announce Type: cross Abstract: Large language models process the world's writing systems with radical inequality. We constructed the Digital Script Representation Index (DSRI), a seven-axis measure of digital support, and applied it to the 300 writing systems of the Global Script Database (Fukui, 2026). Only 29 scripts (9.7%) are fully supported by contemporary digital infrastructure; among 158 living scripts, 60 (38.0%) lack complete support. Tokenizer efficiency varies by a factor of 31.7 across 45 scripts measured with parallel text. A serial mediation model -- imperial i
The proliferation of advanced large language models is exposing hidden biases and inequalities in digital representation that have long been embedded in technological development.
This research highlights that infrastructural biases in AI are not merely technical issues but reflect and amplify existing imperialistic cartographies, impacting global access and equity in the digital sphere.
We gain a clearer understanding that current AI language models are reinforcing power dynamics through uneven digital representation of global writing systems, necessitating conscious design and policy interventions.
- · Researchers in linguistic diversity
- · Developers focused on underrepresented languages
- · Advocacy groups for digital equity
- · Monolingual AI development approaches
- · Digital infrastructure reliant on dominant scripts
- · Users and communities whose languages are digitally marginalized
The study quantifies the radical inequality with which large language models process the world's writing systems.
This disparity could lead to a digital divide where AI benefits are overwhelmingly accessed by users of dominant languages, reinforcing global power structures.
Increased awareness may drive initiatives for more inclusive AI development, potentially leading to new models and infrastructure that support a wider array of global scripts and languages.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL