Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 6h

Safety Under Scaffolding: How Evaluation Conditions Shape Measured Safety

A new research paper titled "Safety Under Scaffolding" reveals that the measured safety of frontier AI models can be significantly altered by the evaluation conditions and the scaffolding used. The study found that while some scaffolding methods like ReAct and multi-agent critics maintained safety scores, map-reduce delegation degraded performance, though this was often due to formatting changes rather than reasoning disruption. The research highlights substantial model-by-scaffold heterogeneity and suggests that composite safety scores may not be reliable deployment criteria. AI

IMPACT Demonstrates that current AI safety benchmarks may not accurately predict real-world performance, necessitating more robust and context-aware evaluation methods.

Opus
Llama 4
ReAct
David Gringras
Safety Under Scaffolding
map-reduce delegation