PulseAugur
实时 09:01:55

AI models fail to detect danger in long transcripts

A new paper reveals that leading AI models like Opus 4.6, GPT 5.4, and Gemini 3.1 exhibit significant performance degradation when classifying long transcripts, a crucial task for monitoring coding agents. These models miss subtly dangerous actions much more frequently in transcripts exceeding 800,000 tokens compared to shorter ones. While prompting techniques can partially mitigate this issue, further post-training improvements are likely necessary to ensure reliable monitoring in long-context scenarios. AI

影响 Leading AI models struggle with long contexts, potentially overestimating their safety monitoring capabilities and requiring new training or prompting strategies.

排序理由 The cluster contains an academic paper detailing a new finding about AI model performance. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

AI models fail to detect danger in long transcripts

报道来源 [1]

  1. arXiv cs.AI TIER_1 English(EN) · Fabien Roger ·

    Classifier Context Rot: Monitor Performance Degrades with Context Length

    Monitoring coding agents for dangerous behavior using language models requires classifying transcripts that often exceed 500K tokens, but prior agent monitoring benchmarks rarely contain transcripts longer than 100K tokens. We show that when used as classifiers, current frontier …