
arXiv:2606.28479v1 Announce Type: cross Abstract: CSIRTs increasingly fine tune language models on vulnerability scan records, but these records expose internal network topology and create privacy risks under regulations such as GDPR and LGPD. We present the first empirical study of how DP SGD and HMAC pseudonymization interact when fine tuning small language models with 1B to 3B parameters on structured CSIRT data. We evaluate 96 LoRA adapters across four SLMs and four training regimes, including raw fine tuning, QLoRA with large batch training, and DP SGD with epsilon equal to 2 and 8. We al
The increasing adoption of fine-tuned language models by CSIRTs for critical vulnerability analysis, combined with stringent data privacy regulations like GDPR, is creating urgent demand for practical privacy-preserving methods.
This research provides empirical evidence and methods for securely deploying AI in sensitive security contexts, directly addressing the tension between AI utility and data privacy, which is a major bottleneck for enterprise AI adoption.
Organizations can now explore more robust and empirically validated approaches to using small language models with sensitive internal data, potentially accelerating AI integration into cybersecurity operations while maintaining regulatory compliance.
- · Cybersecurity companies
- · Enterprises adopting AI
- · CSIRTs
- · Privacy-preserving AI researchers
- · Organizations with weak data governance
- · Proprietary AI vendors without strong privacy solutions
- · Data privacy violators
CSIRTs will gain confidence in using fine-tuned SLMs for vulnerability analysis without undue privacy risks, potentially improving threat detection and response times.
This will drive broader adoption of privacy-preserving fine-tuning techniques across other regulated industries handling sensitive data, such as healthcare and finance.
The demonstrated viability of privacy-preserving AI could lead to the development of new regulatory frameworks that explicitly encourage or mandate such techniques for AI deployments in critical sectors.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI