arXiv:2512.07078v4 Announce Type: replace-cross Abstract: Small object detection in complex scenes exposes a fundamental tension in neural network design: backbone attention distributes computation uniformly regardless of content, pyramid necks inflate activation magnitudes during upsampling without norm compensation, and bottleneck convolutions progressively smooth high-frequency edge components through accumulated spatial filtering. In response, we develop DFIR-DETR by tracing each proposed module back to a specific, measurable deficiency in the RT-DETR baseline: uniform attention that ignor
Source: arXiv cs.LG — read the full report at the original publisher.
