The BD-LSC Dataset: Facilitating the Benchmarking of Models for Lexical Semantic Change Detection in Slang and Standard Usage
Researchers have introduced the BD-LSC and ST-WSD datasets to benchmark models in detecting lexical semantic change, particularly for words with both slang and standard meanings. These datasets enable the study of sense gain, loss, and stability over time. While GPT-4o demonstrated strong performance in few-shot settings on metrics like Exact Sense Match, overall Macro-F1 scores indicate that identifying rare slang senses remains a significant challenge. AI
IMPACT New datasets may improve LLM understanding of nuanced language, especially slang.