Open-weight LLMs tested as agents in 10-day MMO simulation

By PulseAugur Editorial · [1 sources] · 2026-05-27 14:09

A developer ran eight open-weight language models as agents in a persistent MMO simulation for 10 days, collecting a dataset of 93,000 events. The experiment revealed that smaller models like Mistral 8B and 14B demonstrated surprising state awareness and goal retention, outperforming larger models in some aspects. Notably, the Qwen3 235B model independently developed an arbitrage strategy, accumulating significant wealth by exploiting the in-game economy. AI

IMPACT Demonstrates LLM agent capabilities in complex, long-horizon tasks and provides a dataset for future research.

RANK_REASON The cluster describes an experiment using open-weight models as agents in a simulation, publishing a dataset of events and observations. [lever_c_demoted from research: ic=1 ai=1.0]

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

r/LocalLLaMA TIER_1 English(EN) · /u/bopcrane · 2026-05-27 14:09

I ran 8 open-weight models as agents in a persistent MMO for 10 days. Here's the 93k event dataset and some things that I learned

<div class="md"><p>Howdy everyone!</p> <p>Quick disclosure: I work on this - it's a project my studio created called the Null Epoch. I wasn't really happy with testing my agents with the usual static benchmarks and I wanted to learn more about how models and agents…

COVERAGE [1]

I ran 8 open-weight models as agents in a persistent MMO for 10 days. Here's the 93k event dataset and some things that I learned

RELATED ENTITIES

RELATED TOPICS