CArtBench: Evaluating Vision-Language Models on Chinese Art Understanding, Interpretation, and Authenticity

arXiv:2604.11632v2 Announce Type: replace Abstract: We introduce CARTBENCH, a museum-grounded benchmark for evaluating vision-language models (VLMs) on Chinese artworks beyond short-form recognition and QA. CARTBENCH comprises four subtasks: CURATORQA for evidence-grounded recognition and reasoning, CATALOGCAPTION for structured four-section expert-style appreciation, REINTERPRET for defensible reinterpretation with expert ratings, and CONNOISSEURPAIRS for diagnostic authenticity discrimination under visually similar confounds. CARTBENCH is built by aligning image-bearing Palace Museum objects
The proliferation of advanced AI models has led to a natural progression towards more niche and culturally specific applications and evaluations, particularly as global powers invest in distinct AI capabilities.
This benchmark signifies a strategic move by China to develop and assess AI models capable of deep cultural understanding within its specific artistic heritage, distinct from Western-centric evaluations.
The focus of VLM evaluation expands beyond general recognition to encompass culturally nuanced interpretation, authenticity, and expert-level appreciation, raising the bar for global VLM development.
- · Chinese AI research institutions
- · Cultural heritage organizations
- · VLMs with advanced contextual reasoning
- · Generic VLM evaluation benchmarks
- · VLMs lacking cultural specificity
Further development of vision-language models with specialized cultural and historical knowledge.
Increased competition among nations to develop AI systems capable of understanding and interpreting their unique cultural heritage.
Potential for new forms of digital cultural preservation and dissemination, but also concerns regarding AI-driven reinterpretations of sensitive cultural artifacts.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL