CoQuIR: A Comprehensive Benchmark for Code Quality-Aware Information Retrieval
Researchers have introduced CoQuIR, a new benchmark designed to evaluate code retrieval systems on software quality dimensions beyond just functional relevance. This benchmark includes fine-grained quality annotations across correctness, efficiency, security, and maintainability for over 42,000 queries and 134,000 code snippets in 11 languages. Initial testing of 23 retrieval models revealed that even top performers often fail to distinguish between buggy and robust code, highlighting a significant gap in current systems. The research also explores training methods to improve quality-aware retrieval, showing promising results without compromising semantic relevance. AI
IMPACT Highlights the need for AI systems to consider software quality beyond functional correctness, potentially improving developer tools.