A new evaluation framework has been developed to assess the capabilities of large language models (LLMs) in analyzing social media data. This framework, comprising 470 curated questions, was applied to Twitter datasets for tasks like sentiment analysis and hate speech detection. The study found that LLM performance significantly degrades with increasing input scale, especially beyond 500 instances and for numerical tasks, highlighting architectural limitations for quantitative analysis of large text collections. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Highlights critical architectural bottlenecks in current LLMs for quantitative analysis over large text collections.
RANK_REASON The cluster contains an academic paper detailing a new evaluation framework and benchmark results for LLMs. [lever_c_demoted from research: ic=1 ai=1.0]