Researchers have developed SICI, a new seven-dimensional index to measure the semantic-pragmatic complexity of text for LLM stance detection. This index predicts LLM accuracy better than existing methods and reveals that LLM errors shift predictably with increasing complexity, moving from over-attribution to abstention. The study found that common interventions like prompting and retrieval do not fully overcome this high-complexity bottleneck across models including GPT-3.5, GPT-4o-mini, DeepSeek-V3, and GPT-4o. AI
IMPACT This research provides a new metric for evaluating LLM performance on complex tasks, potentially guiding future model development and fine-tuning strategies.
RANK_REASON This is a research paper detailing a new index and findings about LLM behavior.
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →