UrbanWell: Benchmarking Multimodal Large Language Models for Spatio-Temporal Urban Wellbeing Analytics

arXiv:2606.15890v1 Announce Type: new Abstract: Understanding urban wellbeing from multimodal data requires integrating heterogeneous spatial and temporal signals, posing significant challenges for current multimodal large language models (MLLMs). We introduce UrbanWell, a large-scale benchmark designed to systematically evaluate the spatio-temporal reasoning capabilities of MLLMs for urban wellbeing analytics through joint modeling of satellite and street view imagery. UrbanWell spans 38 cities across multiple years and includes diverse indicators covering (1) environmental conditions (CO$_2$
The proliferation of advanced multimodal large language models and the increasing availability of diverse spatio-temporal urban data necessitate systematic benchmarks for real-world applications.
UrbanWell introduces a critical benchmark for evaluating MLLMs, pushing forward their capability to analyze complex urban challenges with implications for smart city development and resource management.
The explicit focus on spatio-temporal reasoning using satellite and street view imagery in a large-scale, multi-city context provides a new standard for MLLM development beyond general language tasks.
- · AI researchers and developers
- · Smart city initiatives
- · Urban planners
- · Geospatial intelligence companies
- · Legacy urban data analytics platforms
- · Models without strong spatio-temporal reasoning
Improved MLLMs capable of better comprehending and predicting urban dynamics, leading to more effective policy interventions.
Development of specialized MLLM agents for urban infrastructure, environmental monitoring, or public health planning.
Enhanced algorithmic governance and resource allocation in complex urban environments, potentially reshaping municipal operations and citizen services.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI