Researchers have introduced NormAct, a new benchmark designed to evaluate how well multimodal large language models (MLLMs) can adhere to hidden social norms in embodied planning tasks. Experiments using GPT-5.4, Claude Opus 4.7, and Gemini 3 Pro revealed that while these models can achieve explicit goals, they struggle significantly with implicit social compliance, succeeding only 26.4% of the time. To address this, the proposed NormPerceptor system helps models infer and apply relevant norms, improving overall task success from 24.2% to 46.7%. AI
IMPACT Highlights a critical gap in LLM reasoning for embodied agents, potentially impacting the development of safer and more socially aware AI systems.
RANK_REASON The cluster describes a new academic benchmark and proposed method for evaluating LLM behavior, published on arXiv.
- arXiv
- Claude Opus 4.7
- Gemini 3 Pro
- GPT-5.4
- NormAct
- NormPerceptor
- alphaXiv
- CatalyzeX Code Finder for Papers
- Connected Papers
- CORE Recommender
- DagsHub
- Gotit.pub
- Hugging Face
- Influence Flower
- Litmaps
- ScienceCast
- scite Smart Citations
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →