SIGNALAI·May 27, 2026, 4:00 AMSignal70Medium term

EpiCurveBench: Evaluating VLMs on Epidemic Curve Digitization

arXiv:2605.27195v1 Announce Type: new Abstract: Chart-to-data extraction with vision-language models (VLMs) is increasingly evaluated on benchmarks that show diminishing headroom (frontier VLMs exceed 89% on ChartQA) and with metrics that treat extracted points as unordered key-value pairs, ignoring the temporal structure of time series and penalizing small alignment shifts as catastrophic failures. We address both gaps with EpiCurveBench, a benchmark of 1,000 real-world epidemic curve images curated from diverse public-health sources, and EpiCurveSimilarity (ECS), an evaluation metric that al

Why this matters

Why now

The proliferation of VLMs and their increasing application in data extraction necessitates more robust and accurate evaluation benchmarks, particularly as current metrics show diminishing returns and overlook critical contextual factors like temporal structure.

Why it’s important

Improved VLM evaluation on complex visual data like charts, especially temporal ones, will significantly enhance their utility in critical applications such as public health, finance, and scientific research, where precise data extraction is paramount.

What changes

The introduction of EpiCurveBench and EpiCurveSimilarity provides a more refined and domain-specific method for assessing VLM performance on time-series data, shifting evaluation beyond simple key-value pair extraction to include temporal integrity.

Winners

· VLM developers
· Public health organizations
· Data scientists
· AI researchers

Losers

· Benchmarks with limited scope
· VLMs with poor temporal understanding

Second-order effects

Direct

More accurate and reliable data extraction from charts using vision-language models for a variety of critical applications.

Second

Accelerated development of VLMs specifically optimized for temporal data analysis, leading to new capabilities in forecasting and trend identification.

Third

Enhanced automation of data ingestion and analysis workflows across industries, reducing human error and increasing the speed of insights derived from visual data.

Editorial confidence: 90 / 100 · Structural impact: 55 / 100

Original report

This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.

Read at arXiv cs.CL

#cs.CL

Tracked by The Continuum Brief · live intelligence network

The Brief · Weekly Dispatch

Stay ahead of the systems reshaping markets.