GPT-3.5-Turbo drops from 90% accuracy to 50% when the answer sits in the middle of a 20k-token prompt instead of the sta
A study found that GPT-3.5-Turbo's accuracy significantly drops when the answer is located in the middle of a long prompt, specifically a 20k-token context window. This phenomenon, documented in the paper "Lost in the Middle: How Language Models Use Long Contexts," is attributed to attention patterns in transformer models that favor information at the beginning or end of a prompt over the middle. The issue is not a retrieval error but rather how the model's attention weights decay towards the center due to training data limitations. AI
IMPACT Highlights a critical limitation in current LLMs for tasks requiring retrieval from long documents, necessitating re-ranking strategies over simply increasing context window size.