
arXiv:2509.02113v2 Announce Type: replace Abstract: The advancement of graph-based malware analysis is critically limited by the absence of large-scale datasets that capture the inherent hierarchical structure of software. Existing methods often oversimplify programs into single level graphs, failing to model the crucial semantic relationship between high-level functional interactions and low-level instruction logic. To bridge this gap, we introduce \dataset, the largest public hierarchical graph dataset for malware analysis, comprising over \textbf{200M} Control Flow Graphs (CFGs) nested with
The increasing sophistication of malware and the limitations of existing flat graph analysis methods necessitate a more nuanced, hierarchical approach to program understanding in cybersecurity.
This development offers a significant advancement in malware detection and analysis, which is critical for securing digital infrastructure and intellectual property against increasingly complex cyber threats.
The availability of a large-scale hierarchical graph dataset will enable the development of more accurate and robust AI models for malware analysis, moving beyond current state-of-the-art limitations.
- · Cybersecurity AI developers
- · Security-conscious organizations
- · Academic researchers in graph neural networks
- · Malware creators
- · Organizations relying on outdated detection methods
Improved AI-driven malware detection capabilities will emerge, enhancing defensive postures.
The cost and success rate of sophisticated cyberattacks may increase due to better detection, shifting the offensive-defensive balance.
This could lead to a 'cyber arms race' where malware developers innovate faster to evade new hierarchical analysis techniques, necessitating continuous defensive advancements.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG