The Ghost Couple: Correlated LLM Name Priors and Their Haunting of the Web and Academic Publishing

arXiv:2606.02184v1 Announce Type: cross Abstract: These names do not exist. Elena Vasquez and Marcus Chen have appeared as volcano experts, astronauts, thriller protagonists, podcast hosts, and academic co-authors across hundreds of independently produced AI-generated documents, never having lived. We show that large language models do not merely default to high-probability individual names when generating fictional experts: they produce correlated character ensembles, pairs and trios whose co-occurrence rates far exceed chance and are consistent across independent generations. These priors ar
The proliferation of advanced LLMs and their increasing use in content generation is exposing previously unnoticed biases and correlations in their output generation across the web.
This phenomenon reveals intrinsic patterning within LLMs that extends beyond simple statistical probabilities, impacting content authenticity, intellectual integrity, and the training data future LLMs consume.
The understanding of AI-generated content moves beyond isolated instances to systemic, correlated phantom entities, posing new challenges for content verification and source analysis.
- · AI Safety Researchers
- · Content Authenticity Platforms
- · Digital Forensics
- · Unregulated Content Platforms
- · Academic Publishing
- · LLM Providers (if unaddressed)
Widespread recognition of systematically correlated generated entities will erode trust in online information and academic integrity.
New techniques and regulatory frameworks will emerge to detect and mitigate 'phantom' identity generation and attribute AI-generated content.
The feedback loop of AI-generated content training future AIs could propagate these correlated priors, creating a self-reinforcing echo chamber of synthetic identities.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG