DexHoldem: Playing Texas Hold'em with Dexterous Embodied System
Researchers have developed DexHoldem, a new benchmark for evaluating embodied AI systems in real-world dexterous manipulation tasks, specifically playing Texas Hold'em. The system includes a ShadowHand for manipulation, a dataset of 1,470 demonstrations, and benchmarks for both primitive skill execution and agentic perception. Initial tests show varying performance across different models, with Opus 4.7 excelling in strict problem-level accuracy for perception and GPT 5.5 leading in average field-wise accuracy, highlighting challenges in integrating perception with policy for closed-loop deployment. AI
IMPACT Introduces a new physical benchmark for evaluating embodied AI, pushing the development of integrated perception and manipulation systems.