
arXiv:2603.21210v3 Announce Type: replace Abstract: Designing urban spaces that provide pedestrian wind comfort and safety requires time-resolved Computational Fluid Dynamics (CFD) simulations, but their current computational cost makes extensive design exploration impractical. We introduce WinDiNet (Wind Diffusion Network), a pretrained video diffusion model that is repurposed as a fast, differentiable surrogate for this task. Starting from LTX-Video, a 2B-parameter latent video transformer, we fine-tune on 10,000 2D incompressible CFD simulations over procedurally generated building layouts.
Advances in large transformer models and diffusion models, specifically in video generation, are enabling their repurposing for complex scientific simulation tasks like fluid dynamics.
This development significantly lowers the computational cost of complex simulations, enabling rapid design iteration for urban planning and potentially other engineering fields, accelerating innovation and reducing development cycles.
The ability to use pretrained video models as fast, differentiable physics simulators changes how computationally intensive design and optimization tasks can be approached, moving from traditional CFD to AI-accelerated methods.
- · Urban Planners
- · Architectural Design Firms
- · AI/ML Research Institutions
- · Infrastructure Developers
- · Traditional CFD Software Vendors (if they don't adapt)
- · Urban planning firms reliant on slow simulation methods
Architects and urban planners can test thousands of design iterations for wind comfort and safety in minutes, not days.
Optimized urban layouts become more common, leading to more resilient and comfortable cities, potentially influencing public health and economic activity in urban centers.
This paradigm of repurposing large AI models for domain-specific, computationally expensive simulations could extend to climate modeling, materials science, or drug discovery, fundamentally altering R&D across industries.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG