Researchers have developed Doctor, a novel framework that combines sequence modeling with reinforced verification for controllable offline decision-making. This approach addresses the unreliability of target return signals, particularly in underrepresented data regions. Doctor employs a masked trajectory Transformer trained on both reconstruction and value learning objectives. At inference, it generates multiple candidate actions and selects the one with the verified value closest to the requested target, improving controllability and maintaining competitiveness on standard benchmarks. AI
IMPACT This research could lead to more reliable and controllable AI systems in domains requiring precise decision-making based on target returns.
RANK_REASON This is a research paper detailing a new framework for decision-making. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →