A Spectral Phase Diagram for Binary Few-Shot Classification: Intrinsic Dimensionality, Geometric Saturation, and Representational Diagnosis

arXiv:2606.24903v1 Announce Type: new Abstract: Deciding when to stop collecting labeled examples is a fundamental but undertheorized problem in applied machine learning. The saturation index $S(K) = \operatorname{erank}(\widehat{\Sigma}_W^{(K)}) / K$ measures the ratio of the effective rank of the pooled within-class sample covariance to the shot count; we prove it falls below a threshold precisely when the covariance estimator is well-concentrated around the population covariance and the linear discriminant has stabilized. The index is computable in $O(d^3)$ time from support features alone,
The proliferation of machine learning applications increases the urgency for robust, interpretable, and efficient methods to manage data collection and model training, especially in data-scarce scenarios.
This research provides a quantifiable metric to determine optimal data collection cessation, offering significant efficiency gains and improved reliability for AI deployment in critical applications where collecting labeled data is costly or difficult.
The introduction of the saturation index provides a new, intrinsic method for diagnosing the stability of linear discriminants in few-shot classification, moving beyond heuristic approaches.
- · Machine Learning Researchers
- · AI Development Teams
- · Industries with High Labeling Costs
- · Inefficient Data Labeling Services
AI models can be trained more efficiently with fewer labeled examples, reducing development costs and time.
Improved model reliability in few-shot scenarios leads to broader and more confident adoption of AI in domains with limited data.
The methodology could influence future active learning strategies and resource allocation for AI projects, emphasizing intrinsic diagnostic tools over empirical trial-and-error.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG