The Signal-Coverage Matrix: Stratifying Type and Semantic Errors in Statement Autoformalization

arXiv:2606.28013v1 Announce Type: new Abstract: Headline type-correctness (TC\%) of LLM autoformalization has climbed from $\sim$53\% to $\sim$76\% in two years, yet this scalar conceals which errors each method resolves. We propose a signal-coverage matrix that crosses the Lean elaborator (pass/fail) with a semantic-equivalence judgment (equivalent/not), sorting every output into one of four cells: true success (TS), type-only (TO), semantic-only (SO), or both fail (BF). On ProofNet\# and MiniF2F-test with DeepSeek V4-Pro across Vanilla, Lean-Retry, Sample-Filter, and Stratified Autoformaliza
The rapid advancement in large language models has accelerated the push towards autoformalization, making the precise evaluation of resulting errors a critical next step for practical deployment.
Improved methods for evaluating autoformalization performance will accelerate the development of reliable AI agents, enabling them to handle complex, formal tasks with greater accuracy.
The introduction of a signal-coverage matrix allows for a more granular and insightful analysis of autoformalization failures, moving beyond simple correctness metrics to understand error types.
- · AI developers
- · Formal verification tooling
- · Software engineering
- · Manual formalization processes
- · Generative AI with poor error analysis
This research provides a more sophisticated framework for diagnosing and improving AI models in formal reasoning tasks.
Better understanding of errors will lead to more robust and trustworthy AI agents capable of operating in highly sensitive environments.
The increased reliability of autoformalization could significantly reduce the cost and time required for software verification and theorem proving, accelerating innovation in those fields.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL