|
--- |
|
license: bigscience-bloom-rail-1.0 |
|
datasets: |
|
- Anthropic/hh-rlhf |
|
language: |
|
- en |
|
- fr |
|
--- |
|
|
|
### bloomz-3b-dpo-chat Model Card |
|
|
|
|
|
**Model Overview** |
|
|
|
The bloomz-3b-dpo-chat is a conversational model fine-tuned using Direct Preference Optimization (DPO) from the base bloomz-3b-sft-chat model. This model aims to |
|
provide high-quality conversational abilities in both English and French, leveraging the pre-trained strengths of its SFT (Supervised Fine-Tuning) predecessor. |
|
|
|
**Parent Model: [bloomz-3b-sft-chat](https://huggingface.co/cmarkea/bloomz-3b-sft-chat)** |
|
|
|
--- |
|
|
|
**Model Description** |
|
|
|
The bloomz-3b-dpo-chat model builds upon the solid foundation of the bloomz-3b-sft-chat, which is notable for its chatbot-specific pre-training and efficient |
|
tokenization strategy. The DPO fine-tuning process enhances the model's ability to generate more human-preferred responses in conversational contexts. |
|
|
|
**Multilingual Capabilities** |
|
|
|
The model was initially trained on both French and English datasets, ensuring high efficiency and performance in these languages. Due to the DPO process and potential |
|
data type changes (from float16 to bfloat16), the model's multilingual capabilities might not be as robust as its SFT predecessor, but fine-tuning can help in restoring |
|
performance in other languages. |
|
|
|
**Model Applications** |
|
|
|
This model is suitable for chatbot applications, customer service automation, and other conversational AI systems where bilingual (French and English) support is |
|
essential. |
|
|
|
|
|
**Dataset** |
|
|
|
The training dataset for the bloomz-7b1-mt-dpo-chat model consists of interactions between individuals and third parties, balanced equally between French and English. A |
|
total of 0.9 billion tokens were used, with translations facilitated by the Google Translate API to maintain balance and quality. |
|
|
|
|
|
**Evaluation** |
|
|
|
Evaluation of the model was conducted using the PoLL (Pool of LLM) technique, assessing performance on 100 French questions with scores aggregated from six evaluations |
|
(two per evaluator). The evaluators included GPT-4o, Gemini-1.5-pro, and Claude3.5-sonnet. |
|
|
|
**Performance Scores (on a scale of 5):** |
|
| Model | Score | |
|
|---------------------------------------------:|:--------| |
|
| gpt-4o | 4.13 | |
|
| mistralai/Mixtral-8x7B-Instruct-v0.1 | 3.71 | |
|
| gpt-3.5-turbo | 3.66 | |
|
| cmarkea/bloomz-7b1-mt-sft-chat | 1.69 | |
|
| cmarkea/bloomz-3b-dpo-chat | 1.68 | |
|
| cmarkea/bloomz-3b-sft-chat | 1.51 | |
|
| croissantllm/CroissantLLMChat-v0.1 | 1.19 | |
|
| cmarkea/bloomz-560m-sft-chat | 1.04 | |
|
| OpenLLM-France/Claire-Mistral-7B-0.1 | 0.38 | |
|
|
|
The bloomz-3b-dpo-chat model demonstrates improved performance over its SFT counterpart, particularly in zero-shot contexts, making it a competitive choice for |
|
production environments. |
|
|
|
|
|
**Usage** |
|
|
|
To utilize the bloomz-3b-dpo-chat model, format the prompt for chatbot interactions as follows: |
|
``` |
|
</s>[human prompt 1]<s>[bot answer 1]</s>[human prompt 2]<s> |
|
``` |
|
Example code to load the model using HuggingFace's pipeline: |
|
|
|
```python |
|
from transformers import pipeline |
|
|
|
model = pipeline("text-generation", "cmarkea/bloomz-3b-dpo-chat") |
|
result = model("</s>C'est quoi le deep learning ?<s>", max_new_tokens=512) |
|
|
|
result |
|
[{'generated_text': "</s>C'est quoi le deep learning ?<s>L'apprentissage |
|
en profondeur est un sous-ensemble de l'apprentissage automatique qui |
|
utilise des réseaux de neurones artificiels pour apprendre à partir de |
|
données. Ces réseaux sont conçus pour reconnaître des modèles dans les |
|
données et peuvent être utilisés pour des tâches telles que la reconnaissance |
|
d'images, le traitement du langage naturel et la reconnaissance vocale."}] |
|
``` |
|
|
|
|
|
**Citation** |
|
|
|
```bibtex |
|
@online{DeBloomzChat, |
|
AUTHOR = {Cyrile Delestre}, |
|
URL = {https://huggingface.co/cmarkea/bloomz-3b-dpo-chat}, |
|
YEAR = {2024}, |
|
KEYWORDS = {NLP ; Transformers ; LLM ; Bloomz}, |
|
} |
|
``` |