ThinkProbe: Beyond Accuracy -- Structural Profiling of Open-Ended LLM Reasoning Traces via Non-Generative Thought Graphs

arXiv:2606.29067v1 Announce Type: new Abstract: We present ThinkProbe, a framework for structural analysis of LLM reasoning traces. ThinkProbe converts each trace into a Thought Graph a directed graph with cycles, 8 node types, and 6 edge types and derives a 19-metric five-dimensional cognitive profile (5D-CP: Breadth, Depth, Structure, Metacognitive, Efficiency) through a fully non-generative pipeline combining rule-based segmentation and discriminative semantic linking. Applied to 4{,}200 traces from 7 native reasoning models across 200 open-ended questions and 10 cognitive domains, ThinkPro
The proliferation of open-ended LLM applications necessitates more robust and interpretable methods for evaluating complex reasoning beyond simple accuracy metrics.
This framework offers a standardized, non-generative approach to profile LLM cognitive traits, moving beyond black-box evaluation and enabling targeted improvements in AI reasoning capabilities.
Current LLM evaluation, largely focused on accuracy or qualitative assessment, can now be augmented or replaced by a structural, quantitative analysis of reasoning processes.
- · LLM Developers
- · AI Researchers
- · AI-powered product companies
- · Evaluation platform providers
- · Companies relying solely on uninterpretable LLM outputs
- · Developers with poor LLM reasoning architectures
ThinkProbe directly enables more systematic debugging and optimization of LLM reasoning architectures.
Improved understanding of LLM reasoning will accelerate the development of more capable and reliable AI agents.
The ability to deeply profile LLM cognition could lead to new avenues for human-AI collaboration and a more trustworthy AI ecosystem.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL