FLAN-T5-Definition Base

This model is a version of FLAN-T5 Base finetuned on a dataset of English definitions and usage examples.

It generates definitions of English words in context. Its input is the usage example and the instruction question "What is the definiton of TARGET_WORD?"

This project is a collaboration between the Dialogue Modelling Group at the University of Amsterdam and the Language Technology Group at the University of Oslo.

Sizes:

Model description

See details in the paper Interpretable Word Sense Representations via Definition Generation: The Case of Semantic Change Analysis (ACL'2023) by Mario Giulianelli, Iris Luden, Raquel Fernandez and Andrey Kutuzov.

Intended uses & limitations

The model is intended for research purposes, as a source of contextualized dictionary-like lexical definitions.

The fine-tuning datasets were limited to English. Although the original FLAN-T5 is a multilingual model, we did not thoroughly evaluate its ability to generate definitions in languages other than English.

Generated definitions can contain all sorts of biases and stereotypes, stemming from the underlying language model.

Training and evaluation data

Three datasets were used to fine-tune the model:

WordNet (Ishiwatari et al., NAACL 2019), also available on HF
Oxford dictionary or CHA (Gadetsky et al., ACL 2018)
English subset of CodWoE (Mickus et al., SemEval 2022)

FLAN-T5-Definition Base achieves the following results on the WordNet test set:

BLEU: 10.38
ROUGE-L: 27.17
BERT-F1: 88.22

FLAN-T5-Definition Base achieves the following results on the Oxford dictionary test set:

BLEU: 7.18
ROUGE-L: 23.04
BERT-F1: 86.90

Training procedure

FLAN-T5 Base was fine-tuned in a sequence-to-sequence mode on examples of contextualized dictionary definitions.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 64
eval_batch_size: 64
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 15.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Gen Len
2.5645	1.0	2740	2.2535	24.4437	6.4189	22.7949	22.7909	11.4969
2.3501	2.0	5480	2.1642	25.6642	7.289	23.8689	23.8749	11.7150
2.2516	3.0	8220	2.1116	26.4562	7.8955	24.6275	24.6376	11.7441
2.1806	4.0	10960	2.0737	27.0392	8.2393	25.1555	25.1641	11.7930
2.1233	5.0	13700	2.0460	27.2709	8.4244	25.3847	25.4003	11.9014
2.0765	6.0	16440	2.0236	27.5456	8.6096	25.6321	25.6462	11.8113
2.0371	7.0	19180	2.0047	27.7209	8.7277	25.7871	25.8084	11.6875
2.0036	8.0	21920	1.9918	28.0431	8.9863	26.1072	26.1198	11.5487
1.9752	9.0	24660	1.9788	28.1807	9.0219	26.1692	26.1886	11.7939
1.9513	10.0	27400	1.9702	28.3204	9.1572	26.2955	26.3029	11.5936
1.9309	11.0	30140	1.9640	28.4289	9.2845	26.4006	26.418	11.8371
1.9144	12.0	32880	1.9571	28.4504	9.3406	26.4273	26.4384	11.6201
1.9013	13.0	35620	1.9544	28.6319	9.3682	26.605	26.613	11.7067
1.8914	14.0	38360	1.9512	28.6435	9.3976	26.5839	26.5918	11.7307
1.8866	15.0	41100	1.9509	28.6111	9.3857	26.551	26.5648	11.7470

Framework versions

Transformers 4.24.0
Pytorch 1.11.0
Datasets 2.3.2
Tokenizers 0.12.1

Citation

@inproceedings{giulianelli-etal-2023-interpretable,
    title = "Interpretable Word Sense Representations via Definition Generation: The Case of Semantic Change Analysis",
    author = "Giulianelli, Mario  and
      Luden, Iris  and
      Fernandez, Raquel  and
      Kutuzov, Andrey",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-long.176",
    doi = "10.18653/v1/2023.acl-long.176",
    pages = "3130--3148",
    abstract = "We propose using automatically generated natural language definitions of contextualised word usages as interpretable word and word sense representations.Given a collection of usage examples for a target word, and the corresponding data-driven usage clusters (i.e., word senses), a definition is generated for each usage with a specialised Flan-T5 language model, and the most prototypical definition in a usage cluster is chosen as the sense label. We demonstrate how the resulting sense labels can make existing approaches to semantic change analysis more interpretable, and how they can allow users {---} historical linguists, lexicographers, or social scientists {---} to explore and intuitively explain diachronic trajectories of word meaning. Semantic change analysis is only one of many possible applications of the {`}definitions as representations{'} paradigm. Beyond being human-readable, contextualised definitions also outperform token or usage sentence embeddings in word-in-context semantic similarity judgements, making them a new promising type of lexical representation for NLP.",
}