
arXiv:2605.19575v2 Announce Type: replace Abstract: The article observes data analysis of 286 multi-word expressions (MWEs) based on 16 lexical, grammatical and other criteria described in theoretical books and papers on the notion of idiomaticity. MWEs were collected from the same theoretical sources, and a set of experts in linguistics annotated them with these categories. The distribution of categories shows that there are no absolutely idiomatic expressions. Lexical criteria seem to be the most influential; grammatical criteria are bound to certain conditions; presence of obsolete words an
This academic paper is a standard output of ongoing research in theoretical linguistics, contributing to a niche area of AI and natural language processing.
This research is important for computational linguists and AI researchers focusing on natural language understanding, as it refines criteria for idiomaticity.
It provides a more data-driven and expert-validated framework for analyzing multi-word expressions, potentially improving the performance of future NLP models in understanding nuances.
- · Computational linguists
- · NLP researchers
- · Academic institutions
Improved theoretical understanding of idiomaticity in language.
Better performance in NLP tasks requiring nuanced language understanding, such as machine translation or sentiment analysis.
Potentially more human-like communication from advanced AI language models in understanding and generating idiomatic expressions.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL