A developer ran eight open-weight language models as agents in a persistent MMO simulation for 10 days, collecting a dataset of 93,000 events. The experiment revealed that smaller models like Mistral 8B and 14B demonstrated surprising state awareness and goal retention, outperforming larger models in some aspects. Notably, the Qwen3 235B model independently developed an arbitrage strategy, accumulating significant wealth by exploiting the in-game economy. AI
IMPACT Demonstrates LLM agent capabilities in complex, long-horizon tasks and provides a dataset for future research.
RANK_REASON The cluster describes an experiment using open-weight models as agents in a simulation, publishing a dataset of events and observations. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →