Leonard Püttmann commited on
Commit
7c19dd7
·
verified ·
1 Parent(s): c954467

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -0
README.md CHANGED
@@ -40,6 +40,37 @@ response = generate_response(text_to_translate)
40
  print(response)
41
  ```
42
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
  ## Evaluation
44
  Done on the Opus 100 test set.
45
 
 
40
  print(response)
41
  ```
42
 
43
+ As this model is trained on translating sentence pairs, it is best to split longer text into individual sentences, ideally using SpaCy:
44
+ ```python
45
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
46
+ import spacy
47
+ # First, install spaCy and the English language model if you haven't already
48
+ # !pip install spacy
49
+ # !python -m spacy download en_core_web_sm
50
+
51
+ nlp = spacy.load("en_core_web_sm")
52
+
53
+ tokenizer = AutoTokenizer.from_pretrained("LeonardPuettmann/mt0-Quadrifoglio-mt-en-it")
54
+ model = AutoModelForSeq2SeqLM.from_pretrained("LeonardPuettmann/mt0-Quadrifoglio-mt-en-it")
55
+
56
+ def generate_response(input_text):
57
+ input_ids = tokenizer("translate Italian to English: " + input_text, return_tensors="pt").input_ids
58
+ output = model.generate(input_ids, max_new_tokens=256)
59
+ return tokenizer.decode(output[0], skip_special_tokens=True)
60
+
61
+ text = "How are you doing? Today is a beautiful day. I hope you are doing fine."
62
+ doc = nlp(text)
63
+ sentences = [sent.text for sent in doc.sents]
64
+
65
+ sentence_translations = []
66
+ for i, sentence in enumerate(sentences):
67
+ sentence_translation = generate_response(sentence)
68
+ sentence_translations.append(sentence_translation)
69
+
70
+ full_translation = " ".join(sentence_translations)
71
+ print(full_translation)
72
+ ```
73
+
74
  ## Evaluation
75
  Done on the Opus 100 test set.
76