Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 15h

Test-Time Deep Thinking to Explore Implicit Rules

Researchers have developed a new framework called Test-Time Exploration (TTExplore) to help AI agents better navigate environments with implicit rules. These hidden constraints often cause agents to get stuck in repetitive trial-and-error loops. TTExplore uses a "thinker" component to infer these rules from interaction history and guide an "actor" agent. The system employs a novel reinforcement learning pipeline that uses task-level scores as indirect rewards, bypassing the difficulty of evaluating intermediate reasoning steps. Experiments show that TTExplore, powered by a specialized 7B model named Exp-Thinker, significantly improves agent performance on text-based embodied tasks. AI

IMPACT This research could lead to more capable AI agents that can operate effectively in complex, real-world scenarios with unstated constraints.

Large Language Models
Test-Time Exploration
Exp-Thinker
TTExplore