Update README.md
Browse files
README.md
CHANGED
@@ -5,4 +5,96 @@ datasets:
|
|
5 |
language:
|
6 |
- en
|
7 |
- fr
|
8 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
language:
|
6 |
- en
|
7 |
- fr
|
8 |
+
---
|
9 |
+
|
10 |
+
### bloomz-3b-dpo-chat Model Card
|
11 |
+
|
12 |
+
|
13 |
+
**Model Overview**
|
14 |
+
|
15 |
+
The bloomz-3b-dpo-chat is a conversational model fine-tuned using Direct Preference Optimization (DPO) from the base bloomz-3b-sft-chat model. This model aims to
|
16 |
+
provide high-quality conversational abilities in both English and French, leveraging the pre-trained strengths of its SFT (Supervised Fine-Tuning) predecessor.
|
17 |
+
|
18 |
+
**Parent Model: [bloomz-3b-sft-chat](https://huggingface.co/cmarkea/bloomz-3b-sft-chat)**
|
19 |
+
|
20 |
+
---
|
21 |
+
|
22 |
+
**Model Description**
|
23 |
+
|
24 |
+
The bloomz-3b-dpo-chat model builds upon the solid foundation of the bloomz-3b-sft-chat, which is notable for its chatbot-specific pre-training and efficient
|
25 |
+
tokenization strategy. The DPO fine-tuning process enhances the model's ability to generate more human-preferred responses in conversational contexts.
|
26 |
+
|
27 |
+
**Multilingual Capabilities**
|
28 |
+
|
29 |
+
The model was initially trained on both French and English datasets, ensuring high efficiency and performance in these languages. Due to the DPO process and potential
|
30 |
+
data type changes (from float16 to bfloat16), the model's multilingual capabilities might not be as robust as its SFT predecessor, but fine-tuning can help in restoring
|
31 |
+
performance in other languages.
|
32 |
+
|
33 |
+
**Model Applications**
|
34 |
+
|
35 |
+
This model is suitable for chatbot applications, customer service automation, and other conversational AI systems where bilingual (French and English) support is
|
36 |
+
essential.
|
37 |
+
|
38 |
+
|
39 |
+
**Dataset**
|
40 |
+
|
41 |
+
The training dataset for the bloomz-7b1-mt-dpo-chat model consists of interactions between individuals and third parties, balanced equally between French and English. A
|
42 |
+
total of 0.9 billion tokens were used, with translations facilitated by the Google Translate API to maintain balance and quality.
|
43 |
+
|
44 |
+
|
45 |
+
**Evaluation**
|
46 |
+
|
47 |
+
Evaluation of the model was conducted using the PoLL (Pool of LLM) technique, assessing performance on 100 French questions with scores aggregated from six evaluations
|
48 |
+
(two per evaluator). The evaluators included GPT-4o, Gemini-1.5-pro, and Claude3.5-sonnet.
|
49 |
+
|
50 |
+
**Performance Scores (on a scale of 5):**
|
51 |
+
| Model | Score |
|
52 |
+
|---------------------------------------------:|:--------|
|
53 |
+
| gpt-4o | 4.13 |
|
54 |
+
| mistralai/Mixtral-8x7B-Instruct-v0.1 | 3.71 |
|
55 |
+
| gpt-3.5-turbo | 3.66 |
|
56 |
+
| cmarkea/bloomz-7b1-mt-sft-chat | 1.69 |
|
57 |
+
| cmarkea/bloomz-3b-dpo-chat | 1.68 |
|
58 |
+
| cmarkea/bloomz-3b-sft-chat | 1.51 |
|
59 |
+
| croissantllm/CroissantLLMChat-v0.1 | 1.19 |
|
60 |
+
| cmarkea/bloomz-560m-sft-chat | 1.04 |
|
61 |
+
| OpenLLM-France/Claire-Mistral-7B-0.1 | 0.38 |
|
62 |
+
|
63 |
+
The bloomz-3b-dpo-chat model demonstrates improved performance over its SFT counterpart, particularly in zero-shot contexts, making it a competitive choice for
|
64 |
+
production environments.
|
65 |
+
|
66 |
+
|
67 |
+
**Usage**
|
68 |
+
|
69 |
+
To utilize the bloomz-3b-dpo-chat model, format the prompt for chatbot interactions as follows:
|
70 |
+
```
|
71 |
+
</s>[human prompt 1]<s>[bot answer 1]</s>[human prompt 2]<s>
|
72 |
+
```
|
73 |
+
Example code to load the model using HuggingFace's pipeline:
|
74 |
+
|
75 |
+
```python
|
76 |
+
from transformers import pipeline
|
77 |
+
|
78 |
+
model = pipeline("text-generation", "cmarkea/bloomz-3b-dpo-chat")
|
79 |
+
result = model("</s>C'est quoi le deep learning ?<s>", max_new_tokens=512)
|
80 |
+
|
81 |
+
result
|
82 |
+
[{'generated_text': "</s>C'est quoi le deep learning ?<s>L'apprentissage
|
83 |
+
en profondeur est un sous-ensemble de l'apprentissage automatique qui
|
84 |
+
utilise des réseaux de neurones artificiels pour apprendre à partir de
|
85 |
+
données. Ces réseaux sont conçus pour reconnaître des modèles dans les
|
86 |
+
données et peuvent être utilisés pour des tâches telles que la reconnaissance
|
87 |
+
d'images, le traitement du langage naturel et la reconnaissance vocale."}]
|
88 |
+
```
|
89 |
+
|
90 |
+
|
91 |
+
**Citation**
|
92 |
+
|
93 |
+
```bibtex
|
94 |
+
@online{DeBloomzChat,
|
95 |
+
AUTHOR = {Cyrile Delestre},
|
96 |
+
URL = {https://huggingface.co/cmarkea/bloomz-3b-dpo-chat},
|
97 |
+
YEAR = {2024},
|
98 |
+
KEYWORDS = {NLP ; Transformers ; LLM ; Bloomz},
|
99 |
+
}
|
100 |
+
```
|