HybridCodeAuthorship: A Benchmark Dataset for Line-Level Code Authorship Detection
Researchers have introduced HybridCodeAuthorship, a new benchmark dataset designed to evaluate AI-generated code detection at a line-by-line level. This dataset simulates real-world industry codebases where human and AI-authored code are interleaved, unlike existing benchmarks that often use academic or completely AI- or human-authored code. The dataset was constructed using Python code files from GitHub repositories. Initial benchmarking showed that the top-performing algorithm, AIGCode Detector, achieved an F1 score of 0.48 for chunk-level detection and 0.56 for line-level detection. AI
IMPACT Enables better tools for managing and analyzing codebases with mixed human and AI authorship.