PulseAugur
EN
LIVE 09:52:11

New benchmark reveals high privacy risks in computer-use AI agents

A new evaluation harness called AgentCIBench has been developed to assess the contextual integrity of computer-use agents (CUAs). These agents, which operate across personal applications like email and calendars, pose a privacy risk by potentially exposing sensitive information from one context to another. AgentCIBench identifies three common failure modes: visual co-location, task-ambiguity overshare, and recipient misalignment. Testing 15 frontier agents revealed a high failure rate, with 11 agents leaking information in over 50% of scenarios, averaging a 67.9% leakage rate. The researchers aim to promote the development of safer CUAs by releasing AgentCIBench as a pre-deployment safety check. AI

IMPACT Highlights critical privacy vulnerabilities in AI agents that interact with personal applications, necessitating new safety checks before deployment.

RANK_REASON The cluster contains a research paper introducing a new benchmark for evaluating AI agents. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New benchmark reveals high privacy risks in computer-use AI agents

COVERAGE [1]

  1. arXiv cs.AI TIER_1 English(EN) · Iryna Gurevych ·

    Capable but Careless: Do Computer-Use Agents Follow Contextual Integrity?

    Computer-use agents (CUAs) now act on a user's behalf across personal applications such as email, calendars, and to-do lists. This cross-application access is useful, but it also creates a privacy risk that has been largely overlooked: when an agent works in one context, it can p…