DexHoldem benchmark tests embodied AI in real-world Texas Hold'em

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-18 17:51

Researchers have developed DexHoldem, a new benchmark for evaluating embodied AI systems in real-world dexterous manipulation tasks, specifically playing Texas Hold'em. The system includes a ShadowHand for manipulation, a dataset of 1,470 demonstrations, and benchmarks for both primitive skill execution and agentic perception. Initial tests show varying performance across different models, with Opus 4.7 excelling in strict problem-level accuracy for perception and GPT 5.5 leading in average field-wise accuracy, highlighting challenges in integrating perception with policy for closed-loop deployment. AI

影响 Introduces a new physical benchmark for evaluating embodied AI, pushing the development of integrated perception and manipulation systems.

排序理由 Publication of an academic paper introducing a new benchmark for embodied AI systems. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Yi Ma · 2026-05-18 17:51

DexHoldem: Playing Texas Hold'em with Dexterous Embodied System

Evaluating embodied systems on real dexterous hardware requires more than isolated primitive skills: an agent must perceive a changing tabletop scene, choose a context-appropriate action, execute it with a dexterous hand, and leave the scene usable for later decisions. We introdu…

报道来源 [1]

DexHoldem: Playing Texas Hold'em with Dexterous Embodied System

相关实体

相关话题