cmarkea
/

bloomz-3b-dpo-chat

 language:
 - en
 - fr
+---
+### bloomz-3b-dpo-chat Model Card
+**Model Overview**
+The bloomz-3b-dpo-chat is a conversational model fine-tuned using Direct Preference Optimization (DPO) from the base bloomz-3b-sft-chat model. This model aims to
+provide high-quality conversational abilities in both English and French, leveraging the pre-trained strengths of its SFT (Supervised Fine-Tuning) predecessor.
+**Parent Model: [bloomz-3b-sft-chat](https://huggingface.co/cmarkea/bloomz-3b-sft-chat)**
+---
+**Model Description**
+The bloomz-3b-dpo-chat model builds upon the solid foundation of the bloomz-3b-sft-chat, which is notable for its chatbot-specific pre-training and efficient
+tokenization strategy. The DPO fine-tuning process enhances the model's ability to generate more human-preferred responses in conversational contexts.
+**Multilingual Capabilities**
+The model was initially trained on both French and English datasets, ensuring high efficiency and performance in these languages. Due to the DPO process and potential
+data type changes (from float16 to bfloat16), the model's multilingual capabilities might not be as robust as its SFT predecessor, but fine-tuning can help in restoring
+performance in other languages.
+**Model Applications**
+This model is suitable for chatbot applications, customer service automation, and other conversational AI systems where bilingual (French and English) support is
+essential.
+**Dataset**
+The training dataset for the bloomz-7b1-mt-dpo-chat model consists of interactions between individuals and third parties, balanced equally between French and English. A
+total of 0.9 billion tokens were used, with translations facilitated by the Google Translate API to maintain balance and quality.
+**Evaluation**
+Evaluation of the model was conducted using the PoLL (Pool of LLM) technique, assessing performance on 100 French questions with scores aggregated from six evaluations
+(two per evaluator). The evaluators included GPT-4o, Gemini-1.5-pro, and Claude3.5-sonnet.
+**Performance Scores (on a scale of 5):**
+| Model                                        | Score   |
+|---------------------------------------------:|:--------|
+| gpt-4o                                       | 4.13    |
+| mistralai/Mixtral-8x7B-Instruct-v0.1         | 3.71    |
+| gpt-3.5-turbo                                | 3.66    |
+| cmarkea/bloomz-7b1-mt-sft-chat               | 1.69    |
+| cmarkea/bloomz-3b-dpo-chat                   | 1.68    |
+| cmarkea/bloomz-3b-sft-chat                   | 1.51    |
+| croissantllm/CroissantLLMChat-v0.1           | 1.19    |
+| cmarkea/bloomz-560m-sft-chat                 | 1.04    |
+| OpenLLM-France/Claire-Mistral-7B-0.1         | 0.38    |
+The bloomz-3b-dpo-chat model demonstrates improved performance over its SFT counterpart, particularly in zero-shot contexts, making it a competitive choice for
+production environments.
+**Usage**
+To utilize the bloomz-3b-dpo-chat model, format the prompt for chatbot interactions as follows:
+```
+</s>[human prompt 1]<s>[bot answer 1]</s>[human prompt 2]<s>
+```
+Example code to load the model using HuggingFace's pipeline:
+```python
+from transformers import pipeline
+model = pipeline("text-generation", "cmarkea/bloomz-3b-dpo-chat")
+result = model("</s>C'est quoi le deep learning ?<s>", max_new_tokens=512)
+result
+[{'generated_text': "</s>C'est quoi le deep learning ?<s>L'apprentissage
+   en profondeur est un sous-ensemble de l'apprentissage automatique qui
+   utilise des réseaux de neurones artificiels pour apprendre à partir de
+   données. Ces réseaux sont conçus pour reconnaître des modèles dans les
+   données et peuvent être utilisés pour des tâches telles que la reconnaissance
+   d'images, le traitement du langage naturel et la reconnaissance vocale."}]
+```
+**Citation**
+```bibtex
+@online{DeBloomzChat,
+  AUTHOR = {Cyrile Delestre},
+  URL = {https://huggingface.co/cmarkea/bloomz-3b-dpo-chat},
+  YEAR = {2024},
+  KEYWORDS = {NLP ; Transformers ; LLM ; Bloomz},
+}
+```