The BD-LSC Dataset: Facilitating the Benchmarking of Models for Lexical Semantic Change Detection in Slang and Standard Usage

arXiv:2606.16560v1 Announce Type: new Abstract: Automatic semantic change detection aims to identify how word meanings shift over time, offering insights into both linguistic and societal change. Despite recent progress in computational lexical semantic change (LSC), existing benchmarks and methods struggle to capture bi-directional semantic change, particularly cases where words simultaneously gain and lose senses. This problem is especially challenging for words that have both slang and standard meanings. To address these gaps, we introduce two complementary benchmark datasets. The Bi-Direct
The proliferation of language models and the increasing sophistication of NLP require more nuanced benchmarks to understand linguistic evolution, especially in dynamic areas like slang.
Improved detection of lexical semantic change helps track cultural shifts, societal trends, and the evolution of language, which is crucial for advanced AI understanding and adaptation.
The introduction of the BD-LSC dataset provides a specific benchmark for bi-directional semantic change, particularly relevant for slang and standard language, enhancing the ability to train and evaluate LSC models.
- · NLP researchers
- · Social scientists
- · AI ethicists
- · AI model developers
- · Models relying on static word embeddings
- · Computational linguistics without robust LSC capabilities
More accurate and context-aware natural language processing models will emerge, better understanding the subtleties of human communication.
This improved understanding could facilitate more effective cross-cultural communication tools and enhanced content moderation systems.
The ability to track linguistic evolution precisely could offer novel insights into predictive social analytics and early detection of emerging societal phenomena.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL