
arXiv:2606.09543v2 Announce Type: replace Abstract: This short paper introduces a stylometric interpretation method inspired by genome-wide association studies (GWAS). Each "gene" token's association with "phenotype" authorship is tested using logistic regression with multiple-comparison correction. Applied to English, German, and Russian corpora, the method detects statistically significant lexical markers distinctive of individual authors.
The proliferation of AI-generated content and the increasing sophistication of language models necessitate improved methods for authorship attribution and stylistic analysis.
This research provides a novel, interpretable approach to stylometric analysis, potentially enhancing digital forensics, intellectual property protection, and the detection of synthetic media.
The ability to reliably identify authors or distinguish human from AI-generated text based on 'lexical markers' becomes more robust and scientifically grounded.
- · Digital forensics providers
- · Content creators and IP holders
- · Social media platforms
- · Researchers in computational linguistics
- · Producers of undetectable AI-generated content
- · Individuals seeking to mask their authorship
- · Disinformation campaigns
More accurate and explainable tools for authorship attribution, especially between humans and AI.
Increased legal and ethical hurdles for anonymous or unattributed content, influencing content creation and verification.
Enhanced defensive capabilities against sophisticated deepfakes and AI-driven influence operations, shifting the cybersecurity landscape.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL