MASCOT-Android: A Curated Dataset and Automated Collection Pipeline for Android Malware Source Code Specimens

arXiv:2606.16072v1 Announce Type: cross Abstract: Compared with binaries and decompiled code, malware source code more directly reflects the attackers' original intent. However, the scarcity of source code and the high cost of manual review make such datasets difficult to build and maintain. We propose MASCOT-Android, a curated dataset of Android malware source code and an automated collection framework for scalable malware source code discovery on GitHub. A key finding of our work is that repository-level documentation alone provides a strong signal for malware source code collection. Our mod
The increasing sophistication of cyber threats and the open-source nature of many AI development pipelines necessitate advanced methods for malware detection and analysis.
This development allows for more direct understanding of attacker intent and provides a scalable way to build larger, more accurate datasets for AI-driven cybersecurity defenses.
The ability to automatically collect and curate Android malware source code specimens significantly improves the efficiency and effectiveness of cybersecurity research and defense mechanisms.
- · Cybersecurity firms
- · Android users
- · Developers of AI security tools
- · National security agencies
- · Malware developers
- · Cybercriminals
Improved detection and prevention of Android malware due to more comprehensive training data for AI models.
Reduced success rates for new Android malware campaigns, leading to decreased financial and data losses for individuals and organizations.
A potential arms race in an AI-driven cybersecurity landscape where AI systems constantly adapt to new malware, leading to increasingly complex attack and defense strategies.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI