
arXiv:2605.21615v1 Announce Type: cross Abstract: Existing binary corpora typically capture only one or two axes of binary variation: they either provide cross-compiler builds without a temporal axis, or CVE labels for single-build binaries. None combine cross-build diversity, cross-version history, and CVE labels into a queryable structure. We present ASSEMBLAGE-DEEPHISTORY, which consolidates these dimensions into a unified framework where every binary's compilation context, source code, vulnerable functions, and package version are stored as first-class metadata. ASSEMBLAGE-DEEPHISTORY comp
The proliferation of software supply chain attacks and the increasing complexity of AI systems necessitate better tooling for binary analysis and vulnerability detection.
This dataset addresses a critical gap in cybersecurity research by providing a comprehensive, temporally diverse, and contextually rich resource for binary-level vulnerability analysis and secure software development.
Researchers and security practitioners will have unprecedented capabilities to study software vulnerabilities across different compilers, versions, and over time, leading to more robust security solutions.
- · Cybersecurity researchers
- · Software developers
- · Security product vendors
- · AI/ML security practitioners
- · Malware authors reliant on unknown vulnerabilities
- · Legacy security scanning tools
Improved detection and mitigation of binary-level software vulnerabilities become possible.
The development of more resilient and secure software supply chains will be accelerated, particularly for systems incorporating AI.
Reduced attack surface for critical infrastructure and AI models, potentially shifting power dynamics in cyber warfare.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG