This paper introduces a novel policy-based Reinforcement Learning (RL) method designed to improve AI agents' performance in the 20 Questions game. The proposed RL approach enables the agent to learn optimal question-selection strategies through interaction with users, overcoming the difficulty of deriving such policies manually. A key feature is the use of a reward network to estimate more informative rewards, making the system robust to noisy answers and independent of a predefined knowledge base of objects. Experimental results indicate that this RL method surpasses an existing entropy-based engineered system and performs competitively in noise-free simulations. AI
IMPACT This research demonstrates a new approach for training AI agents in deductive reasoning and strategy selection, potentially applicable to other interactive AI systems.
RANK_REASON The cluster contains a single academic paper detailing a novel research method. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →