The Dynamics of Human and AI-Generated Language: How Semantics Fluctuates across Different Timescales

arXiv:2606.11371v1 Announce Type: new Abstract: Spoken language, whether produced by humans or large language models (LLM), unfolds over time with varying semantic content. However, we still lack simple, interpretable time-series features that capture how generic versus specific content is distributed over time, and that can be used to compare human and AI-generated speech. We introduce a semantic-timescale analysis pipeline that turns word-level transcripts with timestamps into semantic time-series. For each spoken narrative, we compute (i) semantic specificity using WordNet-based word depth
The proliferation of increasingly sophisticated large language models necessitates new analytical tools to understand and differentiate AI-generated content from human speech.
This research provides a fundamental methodology for comparing human and AI language dynamics, crucial for developing more discerning AI and for addressing issues like deepfakes and automated influence operations.
We now have a proposed method to quantitatively measure and compare semantic fluctuation across different timescales in human and AI-generated language.
- · AI researchers
- · NLP developers
- · Content moderation platforms
- · Academic institutions
- · Malicious actors using undetectable AI-generated content
- · Systems reliant on simple AI detection methods
Improved detection and attribution capabilities for AI-generated textual and spoken content.
Development of adaptive AI models that can mimic or evade detection based on semantic fluctuation patterns.
New legal and ethical frameworks for content authenticity and provenance in a world saturated with advanced AI-generated media.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL