
arXiv:2511.14683v2 Announce Type: replace Abstract: Heaps' or Herdan's law characterizes the word-type vs. word-token relation by a power-law function, which is concave in linear-linear scale but a straight line in log-log scale. However, it has been observed that even in log-log scale, the type-token curve is still slightly concave, invalidating the power-law relation. At the next-order approximation, we have shown, by twenty English novels or writings (some are translated from another language to English), that quadratic functions in log-log scale fit the type-token data perfectly. Regressio
This is a new publication from arXiv cs.CL, representing ongoing academic research in natural language processing and statistical linguistics.
While interesting for computational linguistics, this refinement to Heaps' Law is a highly specialized academic improvement, not impacting broader strategic concerns.
This research refines the mathematical understanding of the type-token relationship in text, suggesting a quadratic fit in log-log scale for improved accuracy in linguistic analysis.
- · Computational linguists
- · NLP researchers
Improved statistical models for text complexity and vocabulary growth in academic settings.
Potentially more accurate assessments for very large corpora, offering marginal gains in specific NLP tasks.
No significant broader impact on AI applications or industry trends, remaining primarily an academic nuance.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL