Falcon3-7B-Instruct / README.md
slimfrikha-tii's picture
docs(readme.md): init readme
2b69b8f
|
raw
history blame
8.07 kB
---
language:
- en
tags:
- falcon3
---
# Table of Contents
0. [TL;DR](#TL;DR)
1. [Model Details](#model-details)
2. [Usage](#usage)
3. [Training Details](#training-details)
4. [Evaluation](#evaluation)
# TL;DR
Falcon 3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.
This repository contains the Falcon3-7B-Instruct, the best Instruct LLM under 8B at the time of release.
# Model Details
## Model Description
- **Developed by:** [https://www.tii.ae](https://www.tii.ae)
- **Model type:** Causal decoder-only
- **Architecture:** Transformer-base
- **Language(s) (NLP):** Mainly English
- **License:** TII Falcon-LLM License 2.0
<br>
# Usage
Find below an example on how to use the model in `transformers` (Make sure to have the latest transformers, or the one built from source):
<details>
<summary> Click to expand </summary>
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "tiiuae/Falcon3-7B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "How many hours in one day?"
messages = [
{"role": "system", "content": "You are a helpful friendly assistant Falcon3 from TII, try to follow instructions as much as possible."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=1024
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
```
</details>
# Training Details
Based on `tiiuae/Falcon3-7B-Base`, post-training stage is comprised of supervised finetuning followed by human preference alignement (DPO).
## Supervised finetuning
### Training Data
1.2 million diverse, high-quality samples Tulu-3, Open-Hermes, Numina an Apigen.
| Data type | ratio |
|--------------------------------------|-------|
| Conversations | 32% |
| STEM | 32% |
| Code | 12% |
| Safety | 9.1% |
| Multi lingual | 8.3% |
| Function call | 3.3% |
| NLP (summarization, generation, QA) | 3.2% |
#### Training Hyperparameters
<style type="text/css">
.tg {border-collapse:collapse;border-spacing:0;}
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
overflow:hidden;padding:10px 5px;word-break:normal;}
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:top}
.tg .tg-7btt{border-color:inherit;font-weight:bold;text-align:center;vertical-align:top}
.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
.tg .tg-ihkz{border-color:inherit;text-align:center;vertical-align:top}
.tg .tg-pcvp{border-color:inherit;text-align:left;vertical-align:top}
.tg .tg-j2vi{border-color:inherit;font-weight:bold;text-align:center;vertical-align:top}
.tg .tg-amwm{border-color:inherit;text-align:left;vertical-align:top}
.tg .tg-0lax{border-color:inherit;font-weight:bold;text-align:center;vertical-align:top}
</style>
<table class="tg"><thead>
<tr>
<th class="tg-7btt" rowspan="3">AdamW</th>
<th class="tg-c3ow">β1</th>
<th class="tg-0pky">0.9</th>
</tr>
<tr>
<th class="tg-ihkz">β2</th>
<th class="tg-pcvp">0.999</th>
</tr>
<tr>
<th class="tg-c3ow">weight decay</th>
<th class="tg-0pky">0.01</th>
</tr></thead>
<tbody>
<tr>
<td class="tg-j2vi" rowspan="4">Learning rate</td>
<td class="tg-ihkz">type</td>
<td class="tg-pcvp">linear decay</td>
</tr>
<tr>
<td class="tg-c3ow">init lr</td>
<td class="tg-0pky">5e-6</td>
</tr>
<tr>
<td class="tg-ihkz">final lr</td>
<td class="tg-pcvp">0</td>
</tr>
<tr>
<td class="tg-c3ow">warm rate</td>
<td class="tg-0pky">0.03</td>
</tr>
<tr>
<td class="tg-j2vi">Batch size</td>
<td class="tg-ihkz"></td>
<td class="tg-pcvp">64</td>
</tr>
<tr>
<td class="tg-amwm">Epochs</td>
<td class="tg-0lax"></td>
<td class="tg-0lax">2</td>
</tr>
</tbody>
</table>
## Human preference alignment - DPO
### Training Data
TO DO DO DO DO
#### Training Hyperparameters
TODODODODOD
# Evaluation
We report in the following table our internal pipeline benchmarks:
<table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
<colgroup>
<col style="width: 10%;">
<col style="width: 10%;">
<col style="width: 7%;">
<col style="width: 7%;">
<col style="width: 7%;">
<col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
</colgroup>
<thead>
<tr>
<th>Category</th>
<th>Benchmark</th>
<th>Llama-3.1-8B-Instruct</th>
<th>Qwen2-7B-Instruct</th>
<th>Qwen2.5-7B-Instruct</th>
<th>Falcon3-7B-Instruct</th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="3">General</td>
<td>MMLU (5-shot)</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>-</td>
</tr>
<tr>
<td>MMLU-PRO (5-shot)</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>-</td>
</tr>
<tr>
<td>IFEval</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>-</td>
</tr>
<tr>
<td rowspan="2">Math</td>
<td>GSM8K (5-shot)</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>-</td>
</tr>
<tr>
<td>MATH(4-shot)</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>-</td>
</tr>
<tr>
<td rowspan="4">Reasoning</td>
<td>Arc Challenge (25-shot)</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>-</td>
</tr>
<tr>
<td>GPQA (0-shot)</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>-</td>
</tr>
<tr>
<td>MUSR (0-shot)</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>-</td>
</tr>
<tr>
<td>BBH (3-shot)</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>-</td>
</tr>
<tr>
<td rowspan="4">CommonSense Understanding</td>
<td>PIQA (0-shot)</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>-</td>
</tr>
<tr>
<td>SciQ (0-shot)</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>-</td>
</tr>
<tr>
<td>Winogrande (0-shot)</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>-</td>
</tr>
<tr>
<td>OpenbookQA (0-shot)</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>-</td>
</tr>
</tbody>
</table>
# Citation
If Falcon3 series were helpful to your work, feel free to give us a cite.
```
@misc{Falcon3,
title = {Falcon 3 family of Open Foundation Models},
author = {TII Team},
month = {December},
year = {2024}
}
```