VectraYX-Nano: A 42M-Parameter Spanish Cybersecurity Language Model with Curriculum Learning and Native Tool Use

arXiv:2605.13989v3 Announce Type: replace Abstract: We present VectraYX-Nano, a 41.95M-parameter decoder-only language model trained from scratch in Spanish for cybersecurity, with a Latin-American regional focus and native tool invocation via the Model Context Protocol (MCP). The model has four contributions. (i) Corpus: VectraYX-Sec-ES, a 170M-token Spanish corpus assembled by an eight-VM distributed pipeline at ~$25 USD of cloud compute and split into three curriculum phases (conversational 42M, cybersecurity 118M, offensive tooling 10M). (ii) Architecture: a 42M Transformer decoder with GQ
The development of smaller, specialized language models with regional and domain-specific focuses is a natural progression in AI capabilities, driven by the need for more efficient and culturally relevant applications.
This model represents a significant step towards enabling specific regions and sectors, like Spanish-speaking cybersecurity, to develop and control their AI infrastructure and applications, reducing reliance on general-purpose models.
The existence of a specialized, Spanish-language AI model for cybersecurity, built with a Latin-American regional focus and native tool invocation, shifts the landscape for cyber defense capabilities in these regions.
- · Latin American cybersecurity firms
- · Spanish-speaking AI developers
- · National security agencies in Latin America
- · Organizations requiring culturally relevant cybersecurity tools
- · General-purpose AI model providers without regional specializations
- · Cyber adversaries targeting Spanish-speaking regions with generic methods
Increased effectiveness of cybersecurity operations in Spanish-speaking regions due to a highly specialized AI.
Accelerated development of other domain-specific and regionally tailored AI models, fostering a more diversified global AI ecosystem.
Enhanced digital sovereignty for nations and regions that can leverage or build similar specialized AI, potentially shifting global power dynamics in AI and cybersecurity.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL