PulseAugur
实时 21:35:12

Nautilus Compass detects LLM agent persona drift without model access

Researchers have developed Nautilus Compass, a novel system designed to detect persona drift in large language model (LLM) agents operating in production environments. This black-box method functions solely at the prompt-text layer, utilizing cosine similarity with behavioral anchor texts and BGE-m3 embeddings to identify deviations. Unlike white-box approaches that require model weights, Nautilus Compass is compatible with closed APIs like Claude and GPT-4, and it operates without LLM calls during indexing, making it more efficient. The system has demonstrated strong performance in detecting drift and retrieving information, outperforming existing baselines on specific benchmarks while maintaining a low reproduction cost. AI

影响 Provides a novel, cost-effective method for monitoring and maintaining LLM agent behavior in production, crucial for reliable AI systems.

排序理由 Academic paper detailing a new method for LLM agent behavior analysis. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

Nautilus Compass detects LLM agent persona drift without model access

报道来源 [1]

  1. arXiv cs.CL TIER_1 English(EN) · Chunxiao Wang ·

    Nautilus Compass: Black-box Persona Drift Detection for Production LLM Agents

    Production LLM coding agents drift over long sessions: they forget user-specified constraints, slip into mistakes the user already flagged, and confabulate prior agreements. White-box approaches such as persona vectors require model weights and so cannot be applied to closed APIs…