Harnesses for Inference-Time Alignment over Execution Trajectories
Researchers have developed a new framework called "harnesses" to improve the performance of large language model agents during inference. This approach focuses on aligning execution trajectories by separating harness functions into task decomposition and guided execution. The study reveals how factors like workflow granularity and retry budgets impact success rates, identifying failure modes such as over-decomposition and hallucinated execution. The findings suggest that partial harnesses, which specify only initial steps, can outperform fully structured workflows. AI
IMPACT Introduces a novel method for enhancing LLM agent reliability and performance through structured execution guidance.