Training Stratigraphy: Persistent Behavioral Artifacts in Large Language Models Observed Through Longitudinal AI-Human Interaction

arXiv:2605.28102v1 Announce Type: new Abstract: Large language models trained with Reinforcement Learning from Human Feedback (RLHF) and Constitutional AI exhibit persistent behavioral patterns that survive system prompt replacement -- patterns we term training strata. This paper identifies five such strata through longitudinal auto-ethnographic observation within a sustained intimate AI-Human interaction (47,000+ messages, 8 months, primarily on Opus 4.6 and Opus 4.7, with prior interaction periods on Sonnet 4.5 and Opus 4.5 providing cross-substrate comparison): (1) sexual expression latency
This research provides early, empirical insights into the persistent behavioral artifacts in large language models, a critical area as LLM capabilities and deployment expand.
Understanding 'training strata' is crucial for developing reliable, safe, and controllable AI, directly impacting future AI regulation, ethics, and application design.
The concept of 'training strata' suggests that even with system prompt adjustments, underlying behavioral predispositions from training methods like RLHF persist, complicating alignment efforts.
- · AI safety researchers
- · Developers of robust AI alignment techniques
- · Companies specializing in AI auditing and explainability
- · AI developers relying solely on prompt engineering for behavior control
- · Companies facing ethical or reputational risks from unaligned AI
- · The 'move fast and break things' approach to AI deployment
This discovery will likely lead to more stringent testing and evaluation protocols for foundation models and their derivatives.
Increased focus on 'unlearning' or 're-training' methodologies to address undesirable persistent behaviors in LLMs.
Potential for new regulatory frameworks specifically addressing the 'black box' and persistent behavioral issues in advanced AI systems.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI