Brief

last 24h

[2/2] 222 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.AI English(EN) · 9h

TRACE: Trajectory Risk-Aware Compression for Long-Horizon Agent Safety

Researchers have introduced TRACE, a novel method for enhancing the safety of long-horizon Large Language Model (LLM) agents. TRACE addresses the challenge of detecting sparse and delayed safety risks that are often missed by traditional turn-level detectors. The system employs a Compressor-Reader design, where a Compressor encodes the entire trajectory into a condensed latent state, which a Reader then uses to evaluate safety. This approach effectively aggregates dispersed risk cues and prevents premature evidence loss, outperforming existing methods on multiple benchmarks. AI

IMPACT Enhances the ability to detect and mitigate safety risks in complex, long-term AI agent interactions.
- Pre-Ex-Bench
- TRACE
- LongSafety
- R-Judge
- LLM
- ASSEBench
TOOL · arXiv cs.AI English(EN) · 3w

Beyond Accuracy: Policy Invariance as a Reliability Test for LLM Safety Judges

Researchers have introduced a new method called policy invariance to assess the reliability of LLM-based safety judges. This approach tests whether an LLM's safety verdicts are consistent regardless of how the evaluation policy is worded or modified. Experiments revealed that current LLM judges are highly sensitive to minor wording changes, leading to significant verdict flips on unambiguous cases, thus conflating agent behavior with prompt phrasing. AI

IMPACT Introduces a new metric to evaluate LLM safety judges, potentially improving the reliability of AI safety evaluations.
- LLM
- ASSEBench
- R-Judge

Brief

TRACE: Trajectory Risk-Aware Compression for Long-Horizon Agent Safety

Beyond Accuracy: Policy Invariance as a Reliability Test for LLM Safety Judges