davmel commited on
Commit
ea6344b
1 Parent(s): 82976bc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -2
README.md CHANGED
@@ -20,9 +20,31 @@ It shows 95% accuracy on a test set comprising 1200 hand-classified sentences.
20
 
21
  The original 6000 sentences were split into 80% training data and 20% testing data. <a href="https://huggingface.co/datasets/davmel/ka_homonym_disambiguation">link to dataset</a>
22
 
23
- <h1>methodology:</h1>
24
  I've masked the homonyms from the sentences and replaced them with their synonyms according to the definitions used. For example, I replaced ”ბარი” with ”დაბლობი” (lowland) where the homonym referred to the field.
25
 
26
  The model predicts "თო" when it interprets the homonym as "Shovel," "დაბ" when it interprets it as "lowland," and "კაფე" when it interprets it as "Cafe."
27
 
28
- My fine-tuned transformer model is based on a pre-trained transformer model which was downloaded from: https://huggingface.co/Davit6174/georgian-distilbert-mlm
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
 
21
  The original 6000 sentences were split into 80% training data and 20% testing data. <a href="https://huggingface.co/datasets/davmel/ka_homonym_disambiguation">link to dataset</a>
22
 
23
+ <h1>Methodology:</h1>
24
  I've masked the homonyms from the sentences and replaced them with their synonyms according to the definitions used. For example, I replaced ”ბარი” with ”დაბლობი” (lowland) where the homonym referred to the field.
25
 
26
  The model predicts "თო" when it interprets the homonym as "Shovel," "დაბ" when it interprets it as "lowland," and "კაფე" when it interprets it as "Cafe."
27
 
28
+ My fine-tuned transformer model is based on a pre-trained transformer model which was downloaded from: https://huggingface.co/Davit6174/georgian-distilbert-mlm
29
+
30
+ <h1>Usage example</h1>
31
+
32
+ ```python
33
+ from transformers import pipeline, AutoModelForMaskedLM, AutoTokenizer
34
+
35
+ model = AutoModelForMaskedLM.from_pretrained('davmel/ka_homonym_disambiguation_FM')
36
+ tokenizer = AutoTokenizer.from_pretrained('davmel/ka_homonym_disambiguation_FM')
37
+
38
+ pipe = pipeline('fill-mask', model=model, tokenizer=tokenizer)
39
+
40
+ answer = {'თო': "თოხი", 'დაბ': 'დაბლობი', 'კაფე': "კაფე"}
41
+ answer_to_english = {"თო": "Shovel", "დაბ": "Lowland", "კაფე": "Cafe"}
42
+
43
+ #Make sure the sentence contains one [MASK] token (otherwise pipeline returns arrays of dictionaries).
44
+ sentence = 'აიღეთ ხელში [MASK], იმუშავეთ მიწაზე'
45
+
46
+ result = pipe(sentence)
47
+
48
+ print("The homonym is used as: ", answer_to_english[result[0]['token_str']])
49
+ print("ომონიმი \"ბარი\" გამოყენებულია როგორც ", answer[result[0]['token_str']])
50
+