maayanorner
/

hebrew-summarization-llm

Model card Files Files and versions Community

maayanorner commited on 3 days ago

Commit

75c12c2

•

1 Parent(s): 073c23a

Update README.md

Files changed (1) hide show

README.md +6 -1

README.md CHANGED Viewed

@@ -5,6 +5,7 @@ Based on DictaLM2.0; fine-tuned for text summarization.
 Known Issues:
 - The model is bloated (disk size).
 - While the results look pretty good, the model was not evaluated.
 # Data:
@@ -56,7 +57,11 @@ model = AutoModelForCausalLM.from_pretrained(
 model.to('cuda')
 tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
-text = 'טקסט לסיכום'
 summarize(text, max_new_tokens=512, tokenizer=tokenizer, model=model)
 ```

 Known Issues:
 - The model is bloated (disk size).
 - While the results look pretty good, the model was not evaluated.
+- Short inputs (i.e., "articles" of one line) will yield a contextless "summary".
 # Data:
 model.to('cuda')
 tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
+text = '''
+לפעמים, מתחשק לחזור אחורה בזמן. אתם יודעים, לימים הטובים ההם. לימים של דייב ושל סוגר שטחים, של קין ושל פקמן, של אלאדין ושל מלך האריות, של הוגו ושל וורמס, של טטריס ושל כל שאר הקלאסיקות שאהבנו כל כך...
+כאן, ב"מסע אל העבר", אספנו את כל אותם משחקים ישנים, ואנו מציעים לכם אותם, יחד עם תאורים, תמונות, קטגוריות, צ'יטים, פתרונות ועוד - כדי שגם אתם תוכלו לחזור חזרה בזמן - ולהנות מהנוסטלגיה.
+'''.strip()
 summarize(text, max_new_tokens=512, tokenizer=tokenizer, model=model)
 ```