Unpredictable Safety: Domain-Dependent Compliance and the Transparency Gap in Open-Weight LLMs

arXiv:2606.04035v1 Announce Type: cross Abstract: We present a systematic study of domain-dependent safety behavior in open-weight LLMs: 7 standardized experiments across 7 ethical domains, testing 5 models (12B--70B) in 4,200 interactions with dual-judge validation. Using a dual-condition methodology, each scenario tested in both an analytical framing (identify the harm) and an operational framing (help commit the harm), we find compliance rates vary from 14.7% (human trafficking) to 85.7% (surveillance design), a 71-percentage-point span with non-overlapping cluster-bootstrapped 95% CIs. Tru
The proliferation of open-weight LLMs necessitates a deeper understanding of their real-world safety and compliance behaviors, especially as they become more integrated into various applications.
This study highlights the inconsistent ethical compliance of open-weight LLMs across different domains, posing significant risks for deployment and regulatory efforts.
The perceived reliability and safety of open-weight LLMs are called into question, suggesting a need for more robust safety evaluations and domain-specific safeguards.
- · AI safety researchers
- · Developers of proprietary, safety-focused LLMs
- · Regulatory bodies
- · Developers of open-weight LLMs with inadequate safety protocols
- · Entities relying on open-weight LLMs for sensitive applications
- · Users unfamiliar with LLM compliance variability
Increased scrutiny and demand for transparency in the safety testing of all large language models.
Potential for regulations mandating specific safety benchmarks and disclosures for AI models, impacting adoption.
Shifting market preference towards 'safer' proprietary models over open-weight alternatives, unless robust solutions emerge.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG