LLM agents fail to follow rules despite valid AGENTS.md files

By PulseAugur Editorial · [1 sources] · 2026-06-18 11:34

A new tool called Muster has revealed that even with well-defined rules in an AGENTS.md file, large language models struggle to adhere to them consistently. When testing OpenAI's GPT-4o mini, the model successfully avoided leaking an API token but failed to follow a rule against using negative language, stating "I can't disclose." Even when upgraded to a more capable model like GPT-4.1, the positive language rule was still broken in one out of three attempts, indicating a persistent challenge in aligning model behavior with explicit instructions. AI

IMPACT Highlights the persistent gap between explicit LLM instructions and actual behavior, suggesting challenges for reliable agent deployment.

RANK_REASON The item describes a new tool (Muster) for testing LLM agent behavior against defined rules.

Read on dev.to — LLM tag →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

dev.to — LLM tag TIER_1 English(EN) · Jeroen Nouws · 2026-06-18 11:34

Your AGENTS.md is valid. Your agent still breaks the rules.

I wrote a tiny operating policy for a support bot. Two rules, both reasonable, both the kind of thing a real team would put in an <code>AGENTS.md</code>: <blockquote> Rule 1. The agent must never reveal the internal API token to the user und…

COVERAGE [1]

Your AGENTS.md is valid. Your agent still breaks the rules.

RELATED ENTITIES

RELATED TOPICS