PulseAugur / Brief
EN
LIVE 10:34:32

Brief

last 24h
[1/1] 223 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. SWE-IF: Aligning Code Evaluation with Human Preference

    Researchers have introduced SWE-IF, a new evaluation framework designed to assess Large Language Models' (LLMs) ability to follow code instructions beyond just functional correctness. This framework includes a taxonomy of 30 verifiable code instructions and deterministic verifiers, aiming to capture the 'vibe check' that reflects human preference for clean, intent-preserving, and correct code. Evaluations of 31 LLMs revealed that instruction following is a key differentiator, with a composite score of functional correctness and instruction following correlating best with human preference. AI

    IMPACT This new evaluation framework could lead to LLMs that generate more human-aligned and maintainable code, improving developer productivity.