
arXiv:2605.23977v1 Announce Type: new Abstract: This paper audits benchmark evaluation in clinical-interview depression detection through four complementary probes across DAIC/E-DAIC, CMDC, ANDROIDS, MODMA, and PDCH. First, we re-evaluate E-DAIC under strict subject-disjoint leave-one-subject-out cross-validation. A lightweight hybrid text-plus-LLM-score model reaches macro-F1 = 0.723 - the highest reported under this protocol, to our knowledge - providing a conservative out-of-fold reference point that does not depend on the privileged official holdout. Second, we test whether the E-DAIC offi
This paper re-evaluates and improves depression detection benchmarks using new methods, leading to more robust models.
Improved detection of depression through clinical interviews leveraging AI could lead to more timely and effective mental health interventions.
The reported benchmark improvements suggest a more reliable foundation for AI-driven depression detection tools, potentially accelerating their adoption in clinical settings.
- · Mental health clinicians
- · AI-driven diagnostic tool developers
- · Patients with depression
More accurate and reliable AI models for depression detection become available.
Increased trust and adoption of AI tools in mental health diagnosis workflows.
Reduced burden on human diagnosticians and earlier intervention for individuals suffering from depression, leading to better outcomes.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL