Researchers have introduced ASSEMBLAGE-DEEPHISTORY, a novel dataset designed to aid in the analysis of software vulnerabilities across different build configurations and historical versions. This dataset contains over 73,000 binaries from 248 open-source projects, compiled using various compilers and operating systems, and includes detailed metadata linking binaries to their source code, vulnerable functions, and package versions. Three analyses were conducted to demonstrate the dataset's utility, including an LLM benchmark for vulnerability detection, an embedding comparison for clustering, and a regression analysis of binary similarity. AI
IMPACT Provides a new resource for training and evaluating AI models in identifying software vulnerabilities across diverse build environments.
RANK_REASON The cluster contains an academic paper detailing a new dataset and benchmark for software vulnerability analysis. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →