Building an Adversarial Malware Dataset by Family and Type: Generation, Evasion, and Poisoning Evaluation

arXiv:2605.25937v1 Announce Type: cross Abstract: We present a dataset of adversarial malware samples derived from the public RawMal-TF collection of real-world malware binaries. Using a suite of adversarial malware generators, we construct two sets of adversarial PE files: 44,347 family-labelled samples and 33,596 type-labelled samples, achieving evasion rates of 98.35 % and 92.20 % against the EMBER classifier, respectively. Each adversarial binary is accompanied by detailed metadata, including EMBER scores and VirusTotal classifications. We further demonstrate the susceptibility of malware
The proliferation of advanced AI in cybersecurity has led to an arms race where offensive techniques are rapidly evolving to evade detection, necessitating new datasets for defensive AI development.
This development highlights the escalating sophistication of cyber threats, particularly adversarial malware, impacting the integrity of digital infrastructure and national security.
The creation of large-scale adversarial malware datasets by family and type provides cybersecurity researchers and AI developers with critical new tools to train more robust detection systems.
- · Cybersecurity researchers
- · AI defense companies
- · National security agencies
- · Traditional malware detection systems
- · Organizations with outdated cybersecurity defenses
- · Cyberattack victims
Improved adversarial training for malware classification models will lead to more resilient cybersecurity systems.
This arms race could accelerate the development of explainable AI in cybersecurity to understand adversarial attack vectors better.
The heightened complexity of cyber warfare might necessitate a global standard for AI-driven cybersecurity resilience.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG