Compositional Concept-Based Neuron-Level Interpretability for Deep Reinforcement Learning

arXiv:2502.00684v2 Announce Type: replace Abstract: Deep reinforcement learning (DRL) has successfully addressed many complex control problems. However, the neural networks representing policies or values remain opaque, undermining trust in high-stakes applications. While concept-based methods have shown promise in deciphering internal representations in computer vision, applying them to DRL is impeded by the absence of pre-defined semantic concepts in continuous state spaces. In this work, we propose a novel concept-based explanation framework designed to provide fine-grained, neuron-level in

Source: arXiv cs.LG — read the full report at the original publisher.

This is a curated wire item. The Continuum Brief does not republish full third-party articles; this entry links to the original source.

Stay ahead of the systems reshaping markets.