
arXiv:2604.03472v3 Announce Type: replace Abstract: Co-evolutionary self-play, where one language model generates problems and another solves them, promises autonomous curriculum learning without human supervision. In practice, the proposer quickly converges to a narrow distribution of problems that satisfy the reward function. This diversity collapse renders the curriculum uninformative for the solver, stalling the co-evolutionary loop. We introduce vocabulary dropout, a random mask applied to the proposer's output logits during both policy training and curriculum generation, as a lightweight
This research addresses a known limitation in current LLM co-evolutionary self-play, which is critical for pushing AI development beyond human supervision, making advancements in autonomous learning timely.
Improving the diversity of problem generation in LLM co-evolution directly enhances the efficiency and effectiveness of autonomous AI training, accelerating the development of more capable and general artificial intelligence.
The introduction of vocabulary dropout offers a lightweight method to prevent diversity collapse in LLM self-play, potentially leading to more robust and versatile AI models developed with less human oversight.
- · AI research institutions
- · LLM developers
- · AI agent designers
- · AI models constrained by narrow training data
- · Teams reliant solely on supervised learning approaches
LLMs can achieve more diverse and effective autonomous learning curricula using this new technique.
This could lead to faster development cycles for advanced AI capabilities and agentic systems.
It might reduce dependency on vast, manually curated datasets and human-in-the-loop supervision for AI training.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL