False Sense of Safety in Selective Signal Classification: Auditing Bound Tightness and Exchangeability for Risk Control
A new research paper published on arXiv examines the effectiveness of selective prediction methods for risk control in AI systems. The study found that common practices like naive thresholding can lead to a false sense of security, with error rates significantly exceeding declared budgets in many trials. Certified methods like Clopper-Pearson and betting upper confidence bounds showed better performance, but still experienced overruns under grouped deployment due to broken exchangeability premises. AI