arXiv:2504.12557v3 Announce Type: replace Abstract: Ensuring safe behavior in reinforcement learning (RL) is challenging when safety constraints are implicit and cannot be densely measured. In many settings, supervision is limited to coarse approvals or rejections of whole trajectories (e.g., whether a rollout remained within an unknown safety threshold). We propose TraCeS (Trajectory-based Constraint Estimation for Safety), a method for learning per-timestep violation credit from such sparse trajectory-level labels. TraCeS trains a sequential violation estimator whose per-step credits factori
Source: arXiv cs.LG — read the full report at the original publisher.
