HumorRank: A Tournament-Based Leaderboard for Evaluating Humor Generation in Large Language Models

arXiv:2604.19786v2 Announce Type: replace Abstract: Humor remains difficult to evaluate in large language models (LLMs) because what makes a response funny is subjective, comparative, and shaped by interacting comedic mechanisms rather than a single scalar property. Existing humor evaluation protocols therefore tend to produce isolated scores or task-specific judgments that are difficult to compare across models. We introduce HumorRank, a tournament-based framework for ranking textual humor generation through theory-grounded pairwise preference judgments. Across SemEval-2026 MWAHAHA and Humor
The proliferation of advanced LLMs necessitates robust and comparative evaluation methods for complex, subjective tasks like humor generation, pushing researchers to develop more sophisticated benchmarks.
Reliable evaluation of nuanced AI capabilities like humor is critical for the development and adoption of more human-like and versatile AI assistants, content creators, and entertainment platforms.
The introduction of a tournament-based leaderboard provides a more standardized and comparative framework for assessing humor generation in LLMs, moving beyond isolated or task-specific judgments.
- · AI humor generation researchers
- · LLM developers
- · AI entertainment platforms
- · AI evaluation companies
- · Subjective, unstandardized AI evaluation methods
- · Developers of less humorous LLMs
HumorRank allows for direct comparison and ranking of LLM humor generation capabilities.
This drives competitive improvement in humor generation among LLM developers and influences product roadmaps for AI-driven content.
Enhanced AI humor capabilities could lead to new forms of AI-generated entertainment and more engaging human-AI interactions, potentially blurring the lines between human and artificial creativity.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL