
arXiv:2604.24021v3 Announce Type: replace Abstract: We present \textbf{QED}, an open-source multi-agent system that turns human-provided research questions into complete mathematical proofs without further human guidance. Its pipeline is designed to overcome common failures of single-query proof generation by separating planning, proving, and verification: a decomposition agent structures the proof search, prover agents generate candidate arguments, and verifier agents check correctness. In collaboration with domain experts, we evaluated QED on 18 research-level projects of varying difficulty.
The accelerating trend in large language models and multi-agent system architectures has matured to a point where complex, abstract problem-solving is becoming viable.
This marks a significant step towards fully autonomous AI systems capable of generating novel, verifiable intellectual output, potentially automating core intellectual labor.
The ability to generate complete mathematical proofs without human guidance moves AI beyond assistive tooling into independent knowledge creation across a foundational scientific domain.
- · AI research and development (academia and industry)
- · Mathematics and theoretical sciences
- · Proof automation software developers
- · Open-source AI communities
- · Tasks requiring manual proof generation
- · Specific segments of academic research that rely on human-only proof discovery
QED immediately demonstrates a new capability boundary for AI in abstract reasoning and problem-solving.
This could lead to accelerated discovery in mathematics and other formal sciences by automating the generation and verification of complex proofs.
The success of multi-agent systems for knowledge creation may drive a broader re-evaluation of 'human-only' intellectual domains and accelerate the development of autonomous agents across various professional fields.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI