Researchers at EleutherAI investigated how different few-shot description prompts affect GPT-3's performance on the SST benchmark. Their experiments revealed that smaller GPT-2 models performed poorly and inconsistently, with performance not strictly increasing with model size. Surprisingly, the study found no correlation between different GPT models regarding which prompts yielded the best results, challenging the expectation that similar models would favor similar prompting strategies. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
RANK_REASON The item describes an academic investigation into prompt engineering and model performance, fitting the 'research' bucket.