arXiv:2602.18528v2 Announce Type: replace Abstract: Audio-visual continual test-time adaptation involves continually adapting a source audio-visual model at test-time, to unlabeled non-stationary domains, where either or both modalities can be distributionally shifted, which hampers online cross-modal learning and eventually leads to poor accuracy. While previous works have tackled this problem, we find that SOTA methods suffer from catastrophic forgetting where the model's performance drops well below even the source model due to continual parameter updates at test-time. In this work, we firs
Source: arXiv cs.LG — read the full report at the original publisher.
