|
--- |
|
language: |
|
- en |
|
tags: |
|
- falcon3 |
|
--- |
|
|
|
|
|
# Table of Contents |
|
|
|
0. [TL;DR](#TL;DR) |
|
1. [Model Details](#model-details) |
|
2. [Usage](#usage) |
|
3. [Training Details](#training-details) |
|
4. [Evaluation](#evaluation) |
|
|
|
|
|
# TL;DR |
|
Falcon 3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B. |
|
|
|
This repository contains the Falcon3-7B-Instruct, the best Instruct LLM under 8B at the time of release. |
|
|
|
# Model Details |
|
|
|
## Model Description |
|
|
|
- **Developed by:** [https://www.tii.ae](https://www.tii.ae) |
|
- **Model type:** Causal decoder-only |
|
- **Architecture:** Transformer-base |
|
- **Language(s) (NLP):** Mainly English |
|
- **License:** TII Falcon-LLM License 2.0 |
|
|
|
<br> |
|
|
|
# Usage |
|
|
|
Find below an example on how to use the model in `transformers` (Make sure to have the latest transformers, or the one built from source): |
|
|
|
<details> |
|
<summary> Click to expand </summary> |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model_name = "tiiuae/Falcon3-7B-Instruct" |
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_name, |
|
torch_dtype="auto", |
|
device_map="auto" |
|
) |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
|
prompt = "How many hours in one day?" |
|
messages = [ |
|
{"role": "system", "content": "You are a helpful friendly assistant Falcon3 from TII, try to follow instructions as much as possible."}, |
|
{"role": "user", "content": prompt} |
|
] |
|
text = tokenizer.apply_chat_template( |
|
messages, |
|
tokenize=False, |
|
add_generation_prompt=True |
|
) |
|
model_inputs = tokenizer([text], return_tensors="pt").to(model.device) |
|
|
|
generated_ids = model.generate( |
|
**model_inputs, |
|
max_new_tokens=1024 |
|
) |
|
generated_ids = [ |
|
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) |
|
] |
|
|
|
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] |
|
print(response) |
|
``` |
|
|
|
</details> |
|
|
|
|
|
# Training Details |
|
Based on `tiiuae/Falcon3-7B-Base`, post-training stage is comprised of supervised finetuning followed by human preference alignement (DPO). |
|
|
|
## Supervised finetuning |
|
### Training Data |
|
1.2 million diverse, high-quality samples Tulu-3, Open-Hermes, Numina an Apigen. |
|
|
|
| Data type | ratio | |
|
|--------------------------------------|-------| |
|
| Conversations | 32% | |
|
| STEM | 32% | |
|
| Code | 12% | |
|
| Safety | 9.1% | |
|
| Multi lingual | 8.3% | |
|
| Function call | 3.3% | |
|
| NLP (summarization, generation, QA) | 3.2% | |
|
|
|
#### Training Hyperparameters |
|
|
|
<style type="text/css"> |
|
.tg {border-collapse:collapse;border-spacing:0;} |
|
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px; |
|
overflow:hidden;padding:10px 5px;word-break:normal;} |
|
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px; |
|
font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;} |
|
.tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:top} |
|
.tg .tg-7btt{border-color:inherit;font-weight:bold;text-align:center;vertical-align:top} |
|
.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top} |
|
.tg .tg-ihkz{border-color:inherit;text-align:center;vertical-align:top} |
|
.tg .tg-pcvp{border-color:inherit;text-align:left;vertical-align:top} |
|
.tg .tg-j2vi{border-color:inherit;font-weight:bold;text-align:center;vertical-align:top} |
|
.tg .tg-amwm{border-color:inherit;text-align:left;vertical-align:top} |
|
.tg .tg-0lax{border-color:inherit;font-weight:bold;text-align:center;vertical-align:top} |
|
</style> |
|
<table class="tg"><thead> |
|
<tr> |
|
<th class="tg-7btt" rowspan="3">AdamW</th> |
|
<th class="tg-c3ow">β1</th> |
|
<th class="tg-0pky">0.9</th> |
|
</tr> |
|
<tr> |
|
<th class="tg-ihkz">β2</th> |
|
<th class="tg-pcvp">0.999</th> |
|
</tr> |
|
<tr> |
|
<th class="tg-c3ow">weight decay</th> |
|
<th class="tg-0pky">0.01</th> |
|
</tr></thead> |
|
<tbody> |
|
<tr> |
|
<td class="tg-j2vi" rowspan="4">Learning rate</td> |
|
<td class="tg-ihkz">type</td> |
|
<td class="tg-pcvp">linear decay</td> |
|
</tr> |
|
<tr> |
|
<td class="tg-c3ow">init lr</td> |
|
<td class="tg-0pky">5e-6</td> |
|
</tr> |
|
<tr> |
|
<td class="tg-ihkz">final lr</td> |
|
<td class="tg-pcvp">0</td> |
|
</tr> |
|
<tr> |
|
<td class="tg-c3ow">warm rate</td> |
|
<td class="tg-0pky">0.03</td> |
|
</tr> |
|
<tr> |
|
<td class="tg-j2vi">Batch size</td> |
|
<td class="tg-ihkz"></td> |
|
<td class="tg-pcvp">64</td> |
|
</tr> |
|
<tr> |
|
<td class="tg-amwm">Epochs</td> |
|
<td class="tg-0lax"></td> |
|
<td class="tg-0lax">2</td> |
|
</tr> |
|
</tbody> |
|
</table> |
|
|
|
## Human preference alignment - DPO |
|
|
|
### Training Data |
|
TO DO DO DO DO |
|
|
|
#### Training Hyperparameters |
|
TODODODODOD |
|
|
|
|
|
# Evaluation |
|
We report in the following table our internal pipeline benchmarks: |
|
|
|
|
|
<table border="1" style="width: 100%; text-align: center; border-collapse: collapse;"> |
|
<colgroup> |
|
<col style="width: 10%;"> |
|
<col style="width: 10%;"> |
|
<col style="width: 7%;"> |
|
<col style="width: 7%;"> |
|
<col style="width: 7%;"> |
|
<col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;"> |
|
</colgroup> |
|
<thead> |
|
<tr> |
|
<th>Category</th> |
|
<th>Benchmark</th> |
|
<th>Llama-3.1-8B-Instruct</th> |
|
<th>Qwen2-7B-Instruct</th> |
|
<th>Qwen2.5-7B-Instruct</th> |
|
<th>Falcon3-7B-Instruct</th> |
|
</tr> |
|
</thead> |
|
<tbody> |
|
<tr> |
|
<td rowspan="3">General</td> |
|
<td>MMLU (5-shot)</td> |
|
<td>-</td> |
|
<td>-</td> |
|
<td>-</td> |
|
<td>-</td> |
|
</tr> |
|
<tr> |
|
<td>MMLU-PRO (5-shot)</td> |
|
<td>-</td> |
|
<td>-</td> |
|
<td>-</td> |
|
<td>-</td> |
|
</tr> |
|
<tr> |
|
<td>IFEval</td> |
|
<td>-</td> |
|
<td>-</td> |
|
<td>-</td> |
|
<td>-</td> |
|
</tr> |
|
<tr> |
|
<td rowspan="2">Math</td> |
|
<td>GSM8K (5-shot)</td> |
|
<td>-</td> |
|
<td>-</td> |
|
<td>-</td> |
|
<td>-</td> |
|
</tr> |
|
<tr> |
|
<td>MATH(4-shot)</td> |
|
<td>-</td> |
|
<td>-</td> |
|
<td>-</td> |
|
<td>-</td> |
|
</tr> |
|
<tr> |
|
<td rowspan="4">Reasoning</td> |
|
<td>Arc Challenge (25-shot)</td> |
|
<td>-</td> |
|
<td>-</td> |
|
<td>-</td> |
|
<td>-</td> |
|
</tr> |
|
<tr> |
|
<td>GPQA (0-shot)</td> |
|
<td>-</td> |
|
<td>-</td> |
|
<td>-</td> |
|
<td>-</td> |
|
</tr> |
|
<tr> |
|
<td>MUSR (0-shot)</td> |
|
<td>-</td> |
|
<td>-</td> |
|
<td>-</td> |
|
<td>-</td> |
|
</tr> |
|
<tr> |
|
<td>BBH (3-shot)</td> |
|
<td>-</td> |
|
<td>-</td> |
|
<td>-</td> |
|
<td>-</td> |
|
</tr> |
|
<tr> |
|
<td rowspan="4">CommonSense Understanding</td> |
|
<td>PIQA (0-shot)</td> |
|
<td>-</td> |
|
<td>-</td> |
|
<td>-</td> |
|
<td>-</td> |
|
</tr> |
|
<tr> |
|
<td>SciQ (0-shot)</td> |
|
<td>-</td> |
|
<td>-</td> |
|
<td>-</td> |
|
<td>-</td> |
|
</tr> |
|
<tr> |
|
<td>Winogrande (0-shot)</td> |
|
<td>-</td> |
|
<td>-</td> |
|
<td>-</td> |
|
<td>-</td> |
|
</tr> |
|
<tr> |
|
<td>OpenbookQA (0-shot)</td> |
|
<td>-</td> |
|
<td>-</td> |
|
<td>-</td> |
|
<td>-</td> |
|
</tr> |
|
</tbody> |
|
</table> |
|
|
|
|
|
# Citation |
|
If Falcon3 series were helpful to your work, feel free to give us a cite. |
|
|
|
``` |
|
@misc{Falcon3, |
|
title = {Falcon 3 family of Open Foundation Models}, |
|
author = {TII Team}, |
|
month = {December}, |
|
year = {2024} |
|
} |
|
``` |
|
|