PulseAugur / Brief
EN
LIVE 08:27:06

Brief

last 24h
[1/1] 223 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. PSEBench: A Controllable and Verifiable Benchmark for Evaluating LLMs in Patient Safety Event Triage

    Researchers have developed PSEBench, a new benchmark designed to evaluate Large Language Models (LLMs) in the critical task of patient safety event triage. This benchmark utilizes a novel policy-grounded construction methodology, employing "clause cards" to break down regulatory text into auditable decision specifications. PSEBench, which includes 5,074 cases based on Minnesota's reportable adverse health events, aims to capture evidence-grounded reasoning, information seeking, and principled abstention in ambiguous situations. Initial evaluations on 15 LLMs have revealed consistent capability trends and identified areas for improvement in applying LLMs to patient safety workflows. AI

    IMPACT Provides a standardized method to assess LLM reliability in high-stakes clinical safety applications.