Supervision versus Demonstration-Based In-Context Learning for Multiword Expression Classification
Researchers investigated the effectiveness of in-context learning for classifying Turkish idiomatic light verb constructions (LVCs). They compared a supervised BERTurk baseline against instruction-tuned large language models (LLMs) using zero-shot, one-shot, and few-shot prompting. While LLMs struggled with LVC recall in zero-shot, few-shot prompting with carefully constructed demonstrations improved performance, with GPT-OSS-20B and Qwen 2.5-14B showing robust results that matched or exceeded the supervised baseline. AI
IMPACT Demonstrates how prompt engineering significantly impacts LLM performance on nuanced linguistic tasks, influencing how models are deployed for specialized NLP applications.