File size: 2,407 Bytes
b276c51 085a096 6c67941 1f71b4b 6c67941 6d1d3bf 1f71b4b 5eb0e9a 82976bc 5eb0e9a aadbb42 ea6344b 5eb0e9a 0843618 ea6344b 085a096 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
---
license: mit
datasets:
- davmel/ka_homonym_disambiguation
language:
- ka
pipeline_tag: fill-mask
widget:
- text: "აიღეთ ხელში <mask>, იმუშავეთ მიწაზე"
---
<iframe
src="https://davmel-georgian-homonym-disambiguation.hf.space"
frameborder="0"
width="850"
height="450"
></iframe>
This model is capable of determining the definition of the homonym "ბარი" located at the position marked by the [MASK] token.
It is a simple Transformer model fine-tuned on a dataset comprising 4800 hand-classified sentences.
It shows 95% accuracy on a test set comprising 1200 hand-classified sentences.
The original 6000 sentences were split into 80% training data and 20% testing data. <a href="https://huggingface.co/datasets/davmel/ka_homonym_disambiguation">link to dataset</a>
<h1>Methodology:</h1>
I've masked the homonyms from the sentences and replaced them with their synonyms according to the definitions used. For example, I replaced ”ბარი” with ”დაბლობი” (lowland) where the homonym referred to the field.
The model predicts "თო" when it interprets the homonym as "Shovel," "დაბ" when it interprets it as "lowland," and "კაფე" when it interprets it as "Cafe."
My fine-tuned transformer model is based on a pre-trained transformer model which was downloaded from: https://huggingface.co/Davit6174/georgian-distilbert-mlm
<h1>Usage example</h1>
```python
from transformers import pipeline, AutoModelForMaskedLM, AutoTokenizer
model = AutoModelForMaskedLM.from_pretrained('davmel/ka_homonym_disambiguation_FM')
tokenizer = AutoTokenizer.from_pretrained('davmel/ka_homonym_disambiguation_FM')
pipe = pipeline('fill-mask', model=model, tokenizer=tokenizer)
answer = {'თო': "თოხი", 'დაბ': 'დაბლობი', 'კაფე': "კაფე"}
answer_to_english = {"თო": "Shovel", "დაბ": "Lowland", "კაფე": "Cafe"}
#Make sure the sentence contains one [MASK] token (otherwise pipeline returns arrays of dictionaries).
sentence = 'აიღეთ ხელში [MASK], იმუშავეთ მიწაზე'
result = pipe(sentence)
print("The homonym is used as: ", answer_to_english[result[0]['token_str']])
print("ომონიმი \"ბარი\" გამოყენებულია როგორც ", answer[result[0]['token_str']]) |