
arXiv:2606.01451v1 Announce Type: new Abstract: Reference-free evaluation of large language model (LLM) creativity relies on perplexity, entropy, and top-1 margin. We show that a much stronger signal lives one step earlier in the pipeline: in how sampling temperature \emph{reshapes} the model's token distribution before the next token is drawn. On Llama-3.1-8B-Instruct generations of 500 open-ended creative prompts at $T \in \{0.3, 0.8, 1.5\}$, a single per-token feature derived from this reshaping predicts the within-prompt creativity rank at Spearman $\rho{=}0.918$ against an averaged gpt-4o
The proliferation of creative generative AI applications makes the reliable evaluation and control of 'creativity' a critical, unsolved challenge, positioning this research as timely.
This research provides a more direct and accurate method for evaluating and potentially controlling creative output in LLMs, which impacts the development and application of AI agents across industries.
The ability to predict creativity rank with high correlation via sampling temperature reshaping means LLM developers have a new, strong internal signal to refine creative generation, moving beyond post-hoc evaluation metrics.
- · AI developers
- · Creative industries using LLMs
- · Generative AI platforms
- · Less efficient LLM evaluation methods
- · AI content farms reliant on brute-force prompting
This new signal will accelerate the development of more controllable and nuanced creative AI models.
Improved creative control will enable a new wave of AI-powered design, marketing, and content generation tools that are more tailored and effective.
The enhanced quality and specificity of creative AI output could further blur the lines between human and AI-generated content, impacting intellectual property and authentication.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL