arXiv:2509.12046v2 Announce Type: replace-cross Abstract: Although autoregressive (AR) models have demonstrated remarkable success in image generation, extending these models to layout-conditioned generation remains challenging due to the sparse nature of layout conditions and the risk of feature entanglement. We present \textbf{S}tructured \textbf{M}asking for \textbf{AR}-based \textbf{L}ayout-to-\textbf{I}mage (SMARLI), a novel framework that effectively integrates spatial layout constraints into the AR generation process. To equip AR models with layout control, a structured masking strategy
Source: arXiv cs.AI — read the full report at the original publisher.
