MiroBench: Benchmarking Realism in Agentic Simulation of Real-world Discussions
Researchers have introduced MiroBench, a new benchmark designed to evaluate the realism of LLM agents simulating real-world discussions, specifically focusing on Reddit threads. The benchmark assesses generated discussions against real ones across four key aspects: repetition and semantic uniformity, narrative content, toxicity and aggression, and structural complexity. Experiments using MiroBench on five models and across five domains revealed that current simulators do not accurately replicate the distributional patterns and interaction dynamics of actual Reddit conversations, with minor improvements observed from prompt-based enhancements. AI
IMPACT Highlights the gap between current LLM agent simulation capabilities and the complexity of real-world human interactions, guiding future research in agent realism.