PetraAI commited on
Commit
9f69fb2
1 Parent(s): f5677ee

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +94 -7
README.md CHANGED
@@ -1,16 +1,103 @@
1
  ---
 
 
 
 
 
 
 
 
2
  language:
 
3
  - en
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  tags:
 
 
 
 
 
 
 
 
 
5
  - text-classification
6
  - emotion
7
- - endpoints-template
8
- license: apache-2.0
9
- datasets:
10
- - emotion
11
- metrics:
12
- - Accuracy, F1 Score
13
  ---
14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
- # Fork of [bhadresh-savani/distilbert-base-uncased-emotion](https://huggingface.co/bhadresh-savani/distilbert-base-uncased-emotion)
 
 
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
+ dataset:
4
+ - PetraAI/PetraAI
5
+ - emotion
6
+ - bigcode/the-stack-v2
7
+ - microsoft/orca-math-word-problems-200k
8
+ - HuggingFaceTB/cosmopedia
9
+ - fka/awesome-chatgpt-prompts
10
  language:
11
+ - ar
12
  - en
13
+ task_categories:
14
+ - text-classification
15
+ - token-classification
16
+ - table-question-answering
17
+ - question-answering
18
+ - zero-shot-classification
19
+ - translation
20
+ - summarization
21
+ - conversational
22
+ - text-generation
23
+ - text2text-generation
24
+ - fill-mask
25
+ - sentence-similarity
26
+ metrics:
27
+ - accuracy
28
+ - f1
29
+ - bertscore
30
+ - bleu
31
+ - bleurt
32
+ - brier_score
33
+ - code_eval
34
+ - character
35
  tags:
36
+ - chemistry
37
+ - biology
38
+ - finance
39
+ - legal
40
+ - music
41
+ - code
42
+ - art
43
+ - climate
44
+ - medical
45
  - text-classification
46
  - emotion
47
+ - endpoints-template
48
+ pretty_name: Zalmati
49
+ library_name: Zalmati
 
 
 
50
  ---
51
 
52
+ # Zalmati
53
+
54
+ ## Overview
55
+
56
+ Zalmati is a powerful multilingual language model trained on the massive and diverse PetraAI dataset. It can handle a wide range of natural language processing tasks including text classification, emotion analysis, question answering, translation, summarization, text generation and more across multiple domains like chemistry, biology, finance, legal, and medicine. With its support for Arabic and English, Zalmati provides cutting-edge AI capabilities to users in the Arabic-speaking world.
57
+
58
+ ## Model Architecture
59
+
60
+ Zalmati is based on the transformer architecture and was pretrained on the 1M-10M sample range of the PetraAI dataset using masked language modeling. It leverages the latest advances in large language models and transfer learning to achieve state-of-the-art performance on various NLP benchmarks.
61
+
62
+ ## Intended Use
63
+
64
+ Zalmati can be used for a multitude of language understanding and generation tasks across different domains. Some example use cases:
65
+
66
+ - Text classification for topics, emotions, etc.
67
+ - Text summarization for legal/financial documents
68
+ - Question answering for knowledge bases
69
+ - Code generation and translation
70
+ - Sentiment analysis for Arabic social media
71
+ - Creative writing and story generation
72
+
73
+ The model outputs should be reviewed and filtered as needed based on the specific use case.
74
+
75
+ ## Limitations and Risks
76
+
77
+ Like all language models, Zalmati may reflect biases present in its training data. It should not be used for any high-stakes decision making without careful testing and monitoring. The model may also make factual mistakes or generate offensive/unsafe content that requires filtering.
78
+
79
+ For development/research purposes only. Not for clinical use. Please review the license for terms.
80
+
81
+ ## How to Use
82
+
83
+ You can use Zalmati through the HuggingFace Transformers library:
84
+
85
+ ```python
86
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
87
+
88
+ tokenizer = AutoTokenizer.from_pretrained("PetraAI/Zalmati")
89
+ model = AutoModelForSeq2SeqLM.from_pretrained("PetraAI/Zalmati")
90
+
91
+ input_text = "Translate the following Arabic text to English: السلام عليكم"
92
+ input_ids = tokenizer(input_text, return_tensors="pt").input_ids
93
+
94
+ outputs = model.generate(input_ids)
95
+ translated = tokenizer.decode(outputs[0], skip_special_tokens=True)
96
+ print(translated) # "Peace be upon you"
97
 
98
+ @article{PetraAI2022ZalmatiModel,
99
+ title={Zalmati: A Powerful Multilingual Language Model for Arabic and English},
100
+ author={First Last and First Last},
101
+ journal={arXiv},
102
+ year={2022}
103
+ }