gal_mBERT / README.md

Update README.md

22ec2aa over 1 year ago

7.78 kB

	---
	license: apache-2.0
	datasets:
	- mbruton/galician_srl
	language:
	- gl
	metrics:
	- seqeval
	library_name: transformers
	pipeline_tag: token-classification
	---

	# Model Card for GalBERT for Semantic Role Labeling (cased)

	This model is fine-tuned on [multilingual BERT](https://huggingface.co/bert-base-multilingual-cased) and is one of 24 models introduced as part of [this project](https://github.com/mbruton0426/GalicianSRL). Prior to this work, there were no published Galician datasets or models for SRL.

	## Model Details

	### Model Description

	GalBERT for Semantic Role Labeling (SRL) is a transformers model, leveraging mBERT's extensive pretraining on 104 languages to achieve better SRL predictions for low-resource Galician. This model is cased: it makes a difference between english and English. It was fine-tuned on Galician with the following objectives:

	- Identify up to 13 verbal roots within a sentence.
	- Identify available arguments for each verbal root. Due to scarcity of data, this model focused solely on the identification of arguments 0, 1, and 2.

	Labels are formatted as: r#:tag, where r# links the token to a specific verbal root of index #, and tag identifies the token as the verbal root (root) or an individual argument (arg0/arg1/arg2)

	- Developed by: [Micaella Bruton](mailto:micaellabruton@gmail.com)
	- Model type: Transformers
	- Language(s) (NLP): Galician (gl)
	- License: Apache 2.0
	- Finetuned from model: [multilingual BERT](https://huggingface.co/bert-base-multilingual-cased)

	### Model Sources

	- Repository: [GalicianSRL](https://github.com/mbruton0426/GalicianSRL)
	- Paper: To be updated

	## Uses

	This model is intended to be used to develop and improve natural language processing tools for Galician.

	## Bias, Risks, and Limitations

	Galician is a low-resource language which prior to this project lacked a semantic role labeling dataset. As such, the dataset used to train this model is extrememly limited and could benefit from the inclusion of additional sentences and manual validation by native speakers.

	## Training Details

	### Training Data

	This model was trained on the "train" portion of the [GalicianSRL Dataset](https://huggingface.co/datasets/mbruton/galician_srl) produced as part of this same project.

	#### Training Hyperparameters

	- Learning Rate: 2e-5
	- Batch Size: 16
	- Weight Decay: 0.01
	- Early Stopping: 10 epochs

	## Evaluation

	#### Testing Data

	This model was tested on the "test" portion of the [GalicianSRL Dataset](https://huggingface.co/datasets/mbruton/galician_srl) produced as part of this same project.

	#### Metrics

	[seqeval](https://huggingface.co/spaces/evaluate-metric/seqeval) is a Python framework for sequence labeling evaluation. It can evaluate the performance of chunking tasks such as named-entity recognition, part-of-speech tagging, and semantic role labeling.
	It supplies scoring both overall and per label type.

	Overall:
	- `accuracy`: the average [accuracy](https://huggingface.co/metrics/accuracy), on a scale between 0.0 and 1.0.
	- `precision`: the average [precision](https://huggingface.co/metrics/precision), on a scale between 0.0 and 1.0.
	- `recall`: the average [recall](https://huggingface.co/metrics/recall), on a scale between 0.0 and 1.0.
	- `f1`: the average [F1 score](https://huggingface.co/metrics/f1), which is the harmonic mean of the precision and recall. It also has a scale of 0.0 to 1.0.

	Per label type:
	- `precision`: the average [precision](https://huggingface.co/metrics/precision), on a scale between 0.0 and 1.0.
	- `recall`: the average [recall](https://huggingface.co/metrics/recall), on a scale between 0.0 and 1.0.
	- `f1`: the average [F1 score](https://huggingface.co/metrics/f1), on a scale between 0.0 and 1.0.

	### Results

	\| Label \| Precision \| Recall \| f1-score \| Support \|
	\| :----------: \| :-------: \| :----: \| :------: \| :-----: \|
	\| 0:arg0 \| 0.72 \| 0.77 \| 0.74 \| 485 \|
	\| 0:arg1 \| 0.74 \| 0.74 \| 0.74 \| 483 \|
	\| 0:arg2 \| 0.66 \| 0.76 \| 0.71 \| 264 \|
	\| 0:root \| 0.92 \| 0.91 \| 0.92 \| 948 \|
	\| 1:arg0 \| 0.68 \| 0.62 \| 0.65 \| 348 \|
	\| 1:arg1 \| 0.69 \| 0.63 \| 0.66 \| 443 \|
	\| 1:arg2 \| 0.65 \| 0.55 \| 0.59 \| 211 \|
	\| 1:root \| 0.85 \| 0.83 \| 0.84 \| 802 \|
	\| 2:arg0 \| 0.59 \| 0.56 \| 0.57 \| 240 \|
	\| 2:arg1 \| 0.61 \| 0.58 \| 0.59 \| 331 \|
	\| 2:arg2 \| 0.56 \| 0.55 \| 0.56 \| 156 \|
	\| 2:root \| 0.79 \| 0.70 \| 0.74 \| 579 \|
	\| 3:arg0 \| 0.42 \| 0.45 \| 0.44 \| 137 \|
	\| 3:arg1 \| 0.54 \| 0.55 \| 0.55 \| 216 \|
	\| 3:arg2 \| 0.48 \| 0.52 \| 0.50 \| 110 \|
	\| 3:root \| 0.63 \| 0.71 \| 0.67 \| 374 \|
	\| 4:arg0 \| 0.42 \| 0.40 \| 0.41 \| 70 \|
	\| 4:arg1 \| 0.50 \| 0.52 \| 0.51 \| 109 \|
	\| 4:arg2 \| 0.46 \| 0.50 \| 0.48 \| 66 \|
	\| 4:root \| 0.50 \| 0.72 \| 0.59 \| 206 \|
	\| 5:arg0 \| 0.27 \| 0.20 \| 0.23 \| 20 \|
	\| 5:arg1 \| 0.35 \| 0.51 \| 0.41 \| 57 \|
	\| 5:arg2 \| 0.27 \| 0.14 \| 0.19 \| 28 \|
	\| 5:root \| 0.42 \| 0.28 \| 0.34 \| 102 \|
	\| 6:arg0 \| 0.50 \| 0.08 \| 0.13 \| 13 \|
	\| 6:arg1 \| 0.20 \| 0.04 \| 0.07 \| 25 \|
	\| 6:arg2 \| 0.00 \| 0.00 \| 0.00 \| 8 \|
	\| 6:root \| 0.25 \| 0.21 \| 0.23 \| 42 \|
	\| 7:arg0 \| 0.00 \| 0.00 \| 0.00 \| 3 \|
	\| 7:arg1 \| 0.00 \| 0.00 \| 0.00 \| 8 \|
	\| 7:arg2 \| 0.00 \| 0.00 \| 0.00 \| 5 \|
	\| 7:root \| 0.00 \| 0.00 \| 0.00 \| 16 \|
	\| 8:arg0 \| 0.00 \| 0.00 \| 0.00 \| 1 \|
	\| 8:arg1 \| 0.00 \| 0.00 \| 0.00 \| 2 \|
	\| 8:arg2 \| 0.00 \| 0.00 \| 0.00 \| 1 \|
	\| 8:root \| 0.00 \| 0.00 \| 0.00 \| 7 \|
	\| 9:arg0 \| 0.00 \| 0.00 \| 0.00 \| 1 \|
	\| 9:arg1 \| 0.00 \| 0.00 \| 0.00 \| 2 \|
	\| 9:arg2 \| 0.00 \| 0.00 \| 0.00 \| 1 \|
	\| 9:root \| 0.00 \| 0.00 \| 0.00 \| 3 \|
	\| 10:arg1 \| 0.00 \| 0.00 \| 0.00 \| 1 \|
	\| 10:root \| 0.00 \| 0.00 \| 0.00 \| 2 \|
	\| micro avg \| 0.69 \| 0.68 \| 0.69 \| 6926 \|
	\| macro avg \| 0.35 \| 0.33 \| 0.33 \| 6926 \|
	\| weighted avg \| 0.69 \| 0.68 \| 0.68 \| 6926 \|
	\| tot root avg \| 0.40 \| 0.40 \| 0.39 \| 3081 \|
	\| tot A0 avg \| 0.36 \| 0.31 \| 0.32 \| 1318 \|
	\| tot A1 avg \| 0.33 \| 0.32 \| 0.32 \| 1677 \|
	\| tot A2 avg \| 0.31 \| 0.30 \| 0.30 \| 850 \|
	\| tot r0 avg \| 0.76 \| 0.80 \| 0.78 \| 2180 \|
	\| tot r1 avg \| 0.72 \| 0.66 \| 0.69 \| 1804 \|
	\| tot r2 avg \| 0.64 \| 0.60 \| 0.62 \| 1306 \|
	\| tot r3 avg \| 0.52 \| 0.56 \| 0.54 \| 837 \|
	\| tot r4 avg \| 0.47 \| 0.54 \| 0.50 \| 451 \|
	\| tot r5 avg \| 0.33 \| 0.28 \| 0.29 \| 207 \|
	\| tot r6 avg \| 0.24 \| 0.08 \| 0.11 \| 88 \|
	\| tot r7 avg \| 0.00 \| 0.00 \| 0.00 \| 32 \|
	\| tot r8 avg \| 0.00 \| 0.00 \| 0.00 \| 11 \|
	\| tot r9 avg \| 0.00 \| 0.00 \| 0.00 \| 7 \|
	\| tot r10 avg \| 0.00 \| 0.00 \| 0.00 \| 3 \|

	## Citation

	BibTeX:

	```
	@mastersthesis{bruton-galician-srl-23,
	author = {Bruton, Micaella},
	title = {BERTie Bott's Every Flavor Labels: A Tasty Guide to Developing a Semantic Role Labeling Model for Galician},
	school = {Uppsala University},
	year = {2023},
	type = {Master's thesis},
	}
	```

	---
	license: apache-2.0
	datasets:
	- mbruton/galician_srl
	language:
	- gl
	metrics:
	- seqeval
	library_name: transformers
	pipeline_tag: token-classification
	---

	# Model Card for GalBERT for Semantic Role Labeling (cased)

	This model is fine-tuned on [multilingual BERT](https://huggingface.co/bert-base-multilingual-cased) and is one of 24 models introduced as part of [this project](https://github.com/mbruton0426/GalicianSRL). Prior to this work, there were no published Galician datasets or models for SRL.

	## Model Details

	### Model Description

	GalBERT for Semantic Role Labeling (SRL) is a transformers model, leveraging mBERT's extensive pretraining on 104 languages to achieve better SRL predictions for low-resource Galician. This model is cased: it makes a difference between english and English. It was fine-tuned on Galician with the following objectives:

	- Identify up to 13 verbal roots within a sentence.
	- Identify available arguments for each verbal root. Due to scarcity of data, this model focused solely on the identification of arguments 0, 1, and 2.

	Labels are formatted as: r#:tag, where r# links the token to a specific verbal root of index #, and tag identifies the token as the verbal root (root) or an individual argument (arg0/arg1/arg2)

	- Developed by: [Micaella Bruton](mailto:micaellabruton@gmail.com)
	- Model type: Transformers
	- Language(s) (NLP): Galician (gl)
	- License: Apache 2.0
	- Finetuned from model: [multilingual BERT](https://huggingface.co/bert-base-multilingual-cased)

	### Model Sources

	- Repository: [GalicianSRL](https://github.com/mbruton0426/GalicianSRL)
	- Paper: To be updated

	## Uses

	This model is intended to be used to develop and improve natural language processing tools for Galician.

	## Bias, Risks, and Limitations

	Galician is a low-resource language which prior to this project lacked a semantic role labeling dataset. As such, the dataset used to train this model is extrememly limited and could benefit from the inclusion of additional sentences and manual validation by native speakers.

	## Training Details

	### Training Data

	This model was trained on the "train" portion of the [GalicianSRL Dataset](https://huggingface.co/datasets/mbruton/galician_srl) produced as part of this same project.

	#### Training Hyperparameters

	- Learning Rate: 2e-5
	- Batch Size: 16
	- Weight Decay: 0.01
	- Early Stopping: 10 epochs

	## Evaluation

	#### Testing Data

	This model was tested on the "test" portion of the [GalicianSRL Dataset](https://huggingface.co/datasets/mbruton/galician_srl) produced as part of this same project.

	#### Metrics

	[seqeval](https://huggingface.co/spaces/evaluate-metric/seqeval) is a Python framework for sequence labeling evaluation. It can evaluate the performance of chunking tasks such as named-entity recognition, part-of-speech tagging, and semantic role labeling.
	It supplies scoring both overall and per label type.

	Overall:
	- `accuracy`: the average [accuracy](https://huggingface.co/metrics/accuracy), on a scale between 0.0 and 1.0.
	- `precision`: the average [precision](https://huggingface.co/metrics/precision), on a scale between 0.0 and 1.0.
	- `recall`: the average [recall](https://huggingface.co/metrics/recall), on a scale between 0.0 and 1.0.
	- `f1`: the average [F1 score](https://huggingface.co/metrics/f1), which is the harmonic mean of the precision and recall. It also has a scale of 0.0 to 1.0.

	Per label type:
	- `precision`: the average [precision](https://huggingface.co/metrics/precision), on a scale between 0.0 and 1.0.
	- `recall`: the average [recall](https://huggingface.co/metrics/recall), on a scale between 0.0 and 1.0.
	- `f1`: the average [F1 score](https://huggingface.co/metrics/f1), on a scale between 0.0 and 1.0.

	### Results

	\| Label \| Precision \| Recall \| f1-score \| Support \|
	\| :----------: \| :-------: \| :----: \| :------: \| :-----: \|
	\| 0:arg0 \| 0.72 \| 0.77 \| 0.74 \| 485 \|
	\| 0:arg1 \| 0.74 \| 0.74 \| 0.74 \| 483 \|
	\| 0:arg2 \| 0.66 \| 0.76 \| 0.71 \| 264 \|
	\| 0:root \| 0.92 \| 0.91 \| 0.92 \| 948 \|
	\| 1:arg0 \| 0.68 \| 0.62 \| 0.65 \| 348 \|
	\| 1:arg1 \| 0.69 \| 0.63 \| 0.66 \| 443 \|
	\| 1:arg2 \| 0.65 \| 0.55 \| 0.59 \| 211 \|
	\| 1:root \| 0.85 \| 0.83 \| 0.84 \| 802 \|
	\| 2:arg0 \| 0.59 \| 0.56 \| 0.57 \| 240 \|
	\| 2:arg1 \| 0.61 \| 0.58 \| 0.59 \| 331 \|
	\| 2:arg2 \| 0.56 \| 0.55 \| 0.56 \| 156 \|
	\| 2:root \| 0.79 \| 0.70 \| 0.74 \| 579 \|
	\| 3:arg0 \| 0.42 \| 0.45 \| 0.44 \| 137 \|
	\| 3:arg1 \| 0.54 \| 0.55 \| 0.55 \| 216 \|
	\| 3:arg2 \| 0.48 \| 0.52 \| 0.50 \| 110 \|
	\| 3:root \| 0.63 \| 0.71 \| 0.67 \| 374 \|
	\| 4:arg0 \| 0.42 \| 0.40 \| 0.41 \| 70 \|
	\| 4:arg1 \| 0.50 \| 0.52 \| 0.51 \| 109 \|
	\| 4:arg2 \| 0.46 \| 0.50 \| 0.48 \| 66 \|
	\| 4:root \| 0.50 \| 0.72 \| 0.59 \| 206 \|
	\| 5:arg0 \| 0.27 \| 0.20 \| 0.23 \| 20 \|
	\| 5:arg1 \| 0.35 \| 0.51 \| 0.41 \| 57 \|
	\| 5:arg2 \| 0.27 \| 0.14 \| 0.19 \| 28 \|
	\| 5:root \| 0.42 \| 0.28 \| 0.34 \| 102 \|
	\| 6:arg0 \| 0.50 \| 0.08 \| 0.13 \| 13 \|
	\| 6:arg1 \| 0.20 \| 0.04 \| 0.07 \| 25 \|
	\| 6:arg2 \| 0.00 \| 0.00 \| 0.00 \| 8 \|
	\| 6:root \| 0.25 \| 0.21 \| 0.23 \| 42 \|
	\| 7:arg0 \| 0.00 \| 0.00 \| 0.00 \| 3 \|
	\| 7:arg1 \| 0.00 \| 0.00 \| 0.00 \| 8 \|
	\| 7:arg2 \| 0.00 \| 0.00 \| 0.00 \| 5 \|
	\| 7:root \| 0.00 \| 0.00 \| 0.00 \| 16 \|
	\| 8:arg0 \| 0.00 \| 0.00 \| 0.00 \| 1 \|
	\| 8:arg1 \| 0.00 \| 0.00 \| 0.00 \| 2 \|
	\| 8:arg2 \| 0.00 \| 0.00 \| 0.00 \| 1 \|
	\| 8:root \| 0.00 \| 0.00 \| 0.00 \| 7 \|
	\| 9:arg0 \| 0.00 \| 0.00 \| 0.00 \| 1 \|
	\| 9:arg1 \| 0.00 \| 0.00 \| 0.00 \| 2 \|
	\| 9:arg2 \| 0.00 \| 0.00 \| 0.00 \| 1 \|
	\| 9:root \| 0.00 \| 0.00 \| 0.00 \| 3 \|
	\| 10:arg1 \| 0.00 \| 0.00 \| 0.00 \| 1 \|
	\| 10:root \| 0.00 \| 0.00 \| 0.00 \| 2 \|
	\| micro avg \| 0.69 \| 0.68 \| 0.69 \| 6926 \|
	\| macro avg \| 0.35 \| 0.33 \| 0.33 \| 6926 \|
	\| weighted avg \| 0.69 \| 0.68 \| 0.68 \| 6926 \|
	\| tot root avg \| 0.40 \| 0.40 \| 0.39 \| 3081 \|
	\| tot A0 avg \| 0.36 \| 0.31 \| 0.32 \| 1318 \|
	\| tot A1 avg \| 0.33 \| 0.32 \| 0.32 \| 1677 \|
	\| tot A2 avg \| 0.31 \| 0.30 \| 0.30 \| 850 \|
	\| tot r0 avg \| 0.76 \| 0.80 \| 0.78 \| 2180 \|
	\| tot r1 avg \| 0.72 \| 0.66 \| 0.69 \| 1804 \|
	\| tot r2 avg \| 0.64 \| 0.60 \| 0.62 \| 1306 \|
	\| tot r3 avg \| 0.52 \| 0.56 \| 0.54 \| 837 \|
	\| tot r4 avg \| 0.47 \| 0.54 \| 0.50 \| 451 \|
	\| tot r5 avg \| 0.33 \| 0.28 \| 0.29 \| 207 \|
	\| tot r6 avg \| 0.24 \| 0.08 \| 0.11 \| 88 \|
	\| tot r7 avg \| 0.00 \| 0.00 \| 0.00 \| 32 \|
	\| tot r8 avg \| 0.00 \| 0.00 \| 0.00 \| 11 \|
	\| tot r9 avg \| 0.00 \| 0.00 \| 0.00 \| 7 \|
	\| tot r10 avg \| 0.00 \| 0.00 \| 0.00 \| 3 \|

	## Citation

	BibTeX:

	```
	@mastersthesis{bruton-galician-srl-23,
	author = {Bruton, Micaella},
	title = {BERTie Bott's Every Flavor Labels: A Tasty Guide to Developing a Semantic Role Labeling Model for Galician},
	school = {Uppsala University},
	year = {2023},
	type = {Master's thesis},
	}
	```