Ishigaki-IDS-Bench: A Benchmark for Generating Information Delivery Specification from BIM Information Requirements

arXiv:2605.22079v1 Announce Type: new Abstract: Large language models (LLMs) are widely used to generate structured outputs such as JSON, SQL, and code, yet public resources remain limited for evaluating generation that must simultaneously satisfy industry-standard XML and domain vocabulary constraints. This paper presents Ishigaki-IDS-Bench, a benchmark for evaluating the ability to generate Information Delivery Specification (IDS) XML from Building Information Modeling (BIM) information requirements. The benchmark contains 166 BIM/IDS expert-authored and verified examples created by expandin
The proliferation of large language models necessitates better evaluation benchmarks for their ability to generate structured, industry-compliant outputs, especially as LLMs are applied to more complex industrial domains.
This benchmark addresses a critical gap in evaluating LLMs for generating precise, industry-specific configurations, moving beyond general code or JSON generation to complex, domain-specific XML with vocabulary constraints.
The ability to accurately assess and improve LLMs' performance in generating validated, industry-standard specifications could accelerate automation and reduce human error in sectors like architecture, engineering, and construction.
- · AI developers specializing in industrial applications
- · Architecture, Engineering, and Construction (AEC) sector
- · LLM companies enhancing structured output capabilities
- · Manual data specification and validation processes
- · Legacy software tools requiring extensive human intervention
Improved LLM performance in generating domain-specific structured data using benchmarks like Ishigaki-IDS-Bench.
Increased adoption of LLMs for automating complex specification and compliance tasks within industrial sectors.
Potential for significantly faster and more accurate project development cycles in industries heavily reliant on detailed information requirements.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL