English(EN) I ran 8 open-weight models as agents in a persistent MMO for 10 days. Here's the 93k event dataset and some things that I learned

开源大模型在 10 天 MMO 模拟中作为代理进行测试

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-27 14:09

一位开发者在持续的 MMO 模拟中运行了八个开源语言模型作为代理，进行了为期 10 天的测试，并收集了 93,000 个事件的数据集。实验表明，像 Mistral 8B 和 14B 这样的小型模型表现出了令人惊讶的状态感知和目标保持能力，在某些方面优于大型模型。值得注意的是，Qwen3 235B 模型独立开发了一种套利策略，通过利用游戏内经济积累了大量财富。 AI

影响展示了 LLM 代理在复杂、长周期的任务中的能力，并为未来的研究提供了数据集。

排序理由该集群描述了一项使用开源模型作为模拟代理的实验，并发布了事件和观察数据集。[lever_c_demoted from research: ic=1 ai=1.0]

在 r/LocalLLaMA 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

r/LocalLLaMA TIER_1 English(EN) · /u/bopcrane · 2026-05-27 14:09

我让8个开源模型在持久化MMO中担任代理运行了10天。这是93k事件数据集和我学到的一些东西

<div class="md"><p>Howdy everyone!</p> <p>Quick disclosure: I work on this - it's a project my studio created called the Null Epoch. I wasn't really happy with testing my agents with the usual static benchmarks and I wanted to learn more about how models and agents…

报道来源 [1]

我让8个开源模型在持久化MMO中担任代理运行了10天。这是93k事件数据集和我学到的一些东西

相关实体

相关话题