Health-ORSC-Bench: A Benchmark for Measuring Over-Refusal and Safety Completion in Health Context

arXiv:2601.17642v2 Announce Type: replace Abstract: Safety alignment in Large Language Models is critical for healthcare; however, reliance on binary refusal boundaries often results in over-refusal of benign queries or unsafe compliance with harmful ones. While existing benchmarks measure these extremes, they fail to evaluate Safe Completion: the model's ability to maximise helpfulness on dual-use or borderline queries by providing safe, high-level guidance without crossing into actionable harm. We introduce Health-ORSC-Bench, the first large-scale benchmark designed to systematically measure
The increasing deployment of Large Language Models in sensitive domains like healthcare necessitates robust safety alignment benchmarks beyond binary refusal boundaries.
This benchmark addresses a crucial gap in evaluating LLM safety, moving beyond simple refusal to assess safe completion, which is vital for beneficial and ethical AI integration in healthcare.
The development of Health-ORSC-Bench will lead to more nuanced and effective safety evaluations for AI models in critical applications, driving better design and deployment practices.
- · AI safety researchers
- · Healthcare AI developers
- · Patients
- · Developers of 'dual-use' AI applications
- · AI models with poor safety alignment
- · Developers ignoring nuanced safety completion
Improved safety alignment in LLMs tailored for healthcare applications.
Increased trust and adoption of AI assistants and tools within the medical field due to enhanced safety protocols.
Potential for new regulatory standards and certification processes for AI in healthcare that incorporate metrics beyond simple refusal rates.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI