
arXiv:2605.27195v1 Announce Type: new Abstract: Chart-to-data extraction with vision-language models (VLMs) is increasingly evaluated on benchmarks that show diminishing headroom (frontier VLMs exceed 89% on ChartQA) and with metrics that treat extracted points as unordered key-value pairs, ignoring the temporal structure of time series and penalizing small alignment shifts as catastrophic failures. We address both gaps with EpiCurveBench, a benchmark of 1,000 real-world epidemic curve images curated from diverse public-health sources, and EpiCurveSimilarity (ECS), an evaluation metric that al
The proliferation of VLMs and their increasing application in data extraction necessitates more robust and accurate evaluation benchmarks, particularly as current metrics show diminishing returns and overlook critical contextual factors like temporal structure.
Improved VLM evaluation on complex visual data like charts, especially temporal ones, will significantly enhance their utility in critical applications such as public health, finance, and scientific research, where precise data extraction is paramount.
The introduction of EpiCurveBench and EpiCurveSimilarity provides a more refined and domain-specific method for assessing VLM performance on time-series data, shifting evaluation beyond simple key-value pair extraction to include temporal integrity.
- · VLM developers
- · Public health organizations
- · Data scientists
- · AI researchers
- · Benchmarks with limited scope
- · VLMs with poor temporal understanding
More accurate and reliable data extraction from charts using vision-language models for a variety of critical applications.
Accelerated development of VLMs specifically optimized for temporal data analysis, leading to new capabilities in forecasting and trend identification.
Enhanced automation of data ingestion and analysis workflows across industries, reducing human error and increasing the speed of insights derived from visual data.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL