UltraFlux: Data-Model Co-Design for High-quality Native 4K Text-to-Image Generation across Diverse Aspect Ratios

arXiv:2511.18050v1 Announce Type: cross Abstract: Diffusion transformers have recently delivered strong text-to-image generation around 1K resolution, but we show that extending them to native 4K across diverse aspect ratios exposes a tightly coupled failure mode spanning positional encoding, VAE compression, and optimization. Tackling any of these factors in isolation leaves substantial quality on the table. We therefore take a data-model co-design view and introduce UltraFlux, a Flux-based DiT trained natively at 4K on MultiAspect-4K-1M, a 1M-image 4K corpus with controlled multi-AR coverage
The continuous push for higher resolution and fidelity in AI-generated imagery demands innovative approaches to overcome existing model limitations.
Achieving native 4K text-to-image generation across diverse aspect ratios significantly elevates the quality and applicability of AI-generated visual content, expanding its utility in various industries.
The ability to reliably generate high-quality 4K images directly from text prompts without substantial quality degradation due to aspect ratio changes will set a new standard for visual AI output.
- · Generative AI developers
- · Digital content creators
- · Advertising and media industries
- · Entertainment sector
- · Current lower-resolution image generation models
- · Providers of computationally intensive upscaling solutions
Higher quality and more versatile AI-generated imagery becomes readily accessible.
This advancement could accelerate the adoption of generative AI in professional design, film, and game development workflows.
The increased realism and availability of AI-generated content might intensify debates around authenticity, copyright, and visual ethics.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI