This is the public repository for our paper: Evaluating Lexical Proficiency in Neural Language Models, C. Ciaccio, A. Miaschi, F. Dell'Orletta (ACL 2025).
The repository contains the resources and code that we developed in order to run our experiments for assessing the lexical proficiency of Italian neural language models. Specifically:
The data folder contains:
- 100-neos.csv → neologism dataset
- 100-nonce.csv → nonce words dataset
- ONLI-NEO → data extracted from Osservatorio Neologico della Lingua Italiana, ONLI.
- it-dictionary-gz → data extracted from the Italian Wiktionary Wikizionario
- (train-test-val)_dataset.csv → the splits for train, test and validation used in our experiments
More resources related on the Wiktionary data format, the parser and the ONLI scraper can be found in our Italian Wiktionary Parser repository.
The code folder contains the file "finetuningT5.py" that we used to finetune all T5 models in a text-to-text multi-task learning setup (training + evaluation).
The annotation folder contains the human annotated scores of novelty and adhesion for the nonce words setting for each model (in batches of 25).
If you use any of the following contents for your work, we kindly ask you to cite our paper:
@inproceedings{ciaccio-etal-2025-evaluating,
title = "Evaluating Lexical Proficiency in Neural Language Models",
author = "Ciaccio, Cristiano and
Miaschi, Alessio and
Dell{'}Orletta, Felice",
editor = "Che, Wanxiang and
Nabende, Joyce and
Shutova, Ekaterina and
Pilehvar, Mohammad Taher",
booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = jul,
year = "2025",
address = "Vienna, Austria",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.acl-long.64/",
pages = "1267--1286",
ISBN = "979-8-89176-251-0",
abstract = "We present a novel evaluation framework designed to assess the lexical proficiency and linguistic creativity of Transformer-based Language Models (LMs). We validate the framework by analyzing the performance of a set of LMs of different sizes, in both mono- and multilingual configuration, across tasks involving the generation, definition, and contextual usage of lexicalized words, neologisms, and nonce words. To support these evaluations, we developed a novel dataset of lexical entries for the Italian language, including curated definitions and usage examples sourced from various online platforms. The results highlight the robustness and effectiveness of our framework in evaluating multiple dimensions of LMs' linguistic understanding and offer an insight, through the assessment of their linguistic creativity, on the lexical generalization abilities of LMs."
}Abstract: We present a novel evaluation framework designed to assess the lexical proficiency and linguistic creativity of Transformer-based Language Models (LMs). We validate the framework by analyzing the performance of a set of LMs of different sizes, in both mono- and multilingual configuration, across tasks involving the generation, definition, and contextual usage of lexicalized words, neologisms, and nonce words. To support these evaluations, we developed a novel dataset of lexical entries for the Italian language, including curated definitions and usage examples sourced from various online platforms. The results highlight the robustness and effectiveness of our framework in evaluating multiple dimensions of LMs' linguistic understanding and offer an insight, through the assessment of their linguistic creativity, on the lexical generalization abilities of LMs