
arXiv:2605.20506v1 Announce Type: new Abstract: Humans learn social norms and behaviors from verbal feedback (e.g., a parent saying "that was rude" or a friend explaining "here's why that hurt"). Yet, learning from feedback for LLMs has largely focused on domains like code and math, where RL rewards are directly verifiable and condensed into scalar values. As LLMs are increasingly used to simulate human behavior, e.g., standing in for users, patients, students, and other personas, there is a pressing need to make them more human-like, which requires embracing a fundamentally different kind of
The increasing use of LLMs for human behavior simulation, coupled with limitations in traditional scalar-based reinforcement learning, creates an urgent need for more nuanced feedback mechanisms.
This research outlines a method for LLMs to learn from verbal feedback, a crucial step enabling them to more accurately model and simulate human social norms and behaviors.
LLMs can now potentially learn complex social dynamics and exhibit more human-like responses, moving beyond purely factual or logical reasoning to incorporate social intelligence.
- · AI developers
- · Simulation platforms
- · Social science research
- · Customer service AI
- · Narrower AI models
- · Current RLHF approaches
LLMs will develop more sophisticated and empathetic conversational abilities and behavioral simulations.
This enhanced realism could lead to new applications in training, therapy, and complex system modeling involving human interaction.
The blurred lines between human and AI behavior could accelerate ethical debates around AI agency and its societal integration.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG