
arXiv:2603.28054v2 Announce Type: replace Abstract: In this paper, we introduce GhostWriteBench, a dataset for LLM authorship attribution. It comprises long-form texts (50K+ words per book) generated by frontier LLMs, and is designed to test generalisation across multiple out-of-distribution (OOD) dimensions, including domain and unseen LLM author. We also propose TRACE -- a novel fingerprinting method that is interpretable and lightweight -- that works for both open- and closed-source models. TRACE creates the fingerprint by capturing token-level transition patterns (e.g., word rank) estimate
The proliferation of advanced LLMs and their increasing use in content generation, coupled with concerns about authorship and intellectual property, makes robust attribution methods critically important at this moment.
The ability to detect and attribute LLM authorship has significant implications for intellectual property, academic integrity, copyright law, and the trustworthiness of digital content across various sectors.
The introduction of datasets like GhostWriteBench and methods like TRACE moves the needle from theoretical concerns about AI-generated text to practical, verifiable attribution, impacting how 'original' content is perceived and regulated.
- · Content creators
- · Copyright holders
- · Academic institutions
- · Plagiarism detection services
- · Malicious content generators
- · Unattributed AI content farms
- · Individuals misrepresenting AI work
Increased scrutiny and accountability for AI-generated textual content will become standard.
Legal frameworks and industry standards for AI authorship will begin to solidify, potentially leading to new copyright laws or amendments.
The market for 'human-verified' or 'human-authored' content may gain a premium as AI-generated text becomes ubiquitous and easily detectable.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL