arXiv:2606.10835v1 Announce Type: new Abstract: Periodic hard target updates are among the most common stabilization devices in modern deep Q-learning. Recent studies suggest that target updates can improve stability in Q-learning with function approximation, including linear function approximation. We introduce and analyze the so-called $\lambda$-target update, obtained by averaging the $m$-periodic target update maps with $\lambda$-geometric weights $(1-\lambda)\lambda^{m-1}$, $\lambda \in [0,1]$. The endpoint $\lambda=0$ recovers the one-period target update, while the continuous endpoint $
Source: arXiv cs.LG — read the full report at the original publisher.
