Researchers have introduced HybridCodeAuthorship, a new benchmark dataset designed to evaluate AI-generated code detection at a line-by-line level. This dataset simulates real-world industry codebases where human and AI-authored code are interleaved, unlike existing benchmarks that often use academic or completely AI- or human-authored code. The dataset was constructed using Python code files from GitHub repositories. Initial benchmarking showed that the top-performing algorithm, AIGCode Detector, achieved an F1 score of 0.48 for chunk-level detection and 0.56 for line-level detection. AI
IMPACT Enables better tools for managing and analyzing codebases with mixed human and AI authorship.
RANK_REASON The cluster describes a new benchmark dataset for AI research. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →