File size: 8,069 Bytes

2b69b8f

---
language:
- en
tags:
- falcon3
---


#  Table of Contents

0. [TL;DR](#TL;DR)
1. [Model Details](#model-details)
2. [Usage](#usage)
3. [Training Details](#training-details)
4. [Evaluation](#evaluation)


# TL;DR
Falcon 3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.

This repository contains the Falcon3-7B-Instruct, the best Instruct LLM under 8B at the time of release.

# Model Details

## Model Description

- **Developed by:** [https://www.tii.ae](https://www.tii.ae)
- **Model type:** Causal decoder-only
- **Architecture:** Transformer-base
- **Language(s) (NLP):** Mainly English
- **License:** TII Falcon-LLM License 2.0

<br>

# Usage

Find below an example on how to use the model in `transformers` (Make sure to have the latest transformers, or the one built from source):

<details>
<summary> Click to expand </summary>

```python
from transformers import AutoTokenizer, AutoModelForCausalLM


from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "tiiuae/Falcon3-7B-Instruct"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "How many hours in one day?"
messages = [
    {"role": "system", "content": "You are a helpful friendly assistant Falcon3 from TII, try to follow instructions as much as possible."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=1024
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
```

</details>


# Training Details
Based on `tiiuae/Falcon3-7B-Base`, post-training stage is comprised of supervised finetuning followed by human preference alignement (DPO).

## Supervised finetuning
### Training Data
1.2 million diverse, high-quality samples Tulu-3, Open-Hermes, Numina an Apigen.

| Data type                            | ratio |
|--------------------------------------|-------|
| Conversations                        | 32%   |
| STEM                                 | 32%   |
| Code                                 | 12%   |
| Safety                               | 9.1%  |
| Multi lingual                        | 8.3%  |
| Function call                        | 3.3%  |
| NLP (summarization,  generation, QA) | 3.2%  |

#### Training Hyperparameters

<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
  overflow:hidden;padding:10px 5px;word-break:normal;}
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
  font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:top}
.tg .tg-7btt{border-color:inherit;font-weight:bold;text-align:center;vertical-align:top}
.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
.tg .tg-ihkz{border-color:inherit;text-align:center;vertical-align:top}
.tg .tg-pcvp{border-color:inherit;text-align:left;vertical-align:top}
.tg .tg-j2vi{border-color:inherit;font-weight:bold;text-align:center;vertical-align:top}
.tg .tg-amwm{border-color:inherit;text-align:left;vertical-align:top}
.tg .tg-0lax{border-color:inherit;font-weight:bold;text-align:center;vertical-align:top}
</style>
<table class="tg"><thead>
  <tr>
    <th class="tg-7btt" rowspan="3">AdamW</th>
    <th class="tg-c3ow">β1</th>
    <th class="tg-0pky">0.9</th>
  </tr>
  <tr>
    <th class="tg-ihkz">β2</th>
    <th class="tg-pcvp">0.999</th>
  </tr>
  <tr>
    <th class="tg-c3ow">weight decay</th>
    <th class="tg-0pky">0.01</th>
  </tr></thead>
<tbody>
  <tr>
    <td class="tg-j2vi" rowspan="4">Learning rate</td>
    <td class="tg-ihkz">type</td>
    <td class="tg-pcvp">linear decay</td>
  </tr>
  <tr>
    <td class="tg-c3ow">init lr</td>
    <td class="tg-0pky">5e-6</td>
  </tr>
  <tr>
    <td class="tg-ihkz">final lr</td>
    <td class="tg-pcvp">0</td>
  </tr>
  <tr>
    <td class="tg-c3ow">warm rate</td>
    <td class="tg-0pky">0.03</td>
  </tr>
  <tr>
    <td class="tg-j2vi">Batch size</td>
    <td class="tg-ihkz"></td>
    <td class="tg-pcvp">64</td>
  </tr>
  <tr>
    <td class="tg-amwm">Epochs</td>
    <td class="tg-0lax"></td>
    <td class="tg-0lax">2</td>
  </tr>
</tbody>
</table>

## Human preference alignment - DPO

### Training Data
TO DO DO DO DO

#### Training Hyperparameters
TODODODODOD


# Evaluation
We report in the following table our internal pipeline benchmarks:


<table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
    <colgroup>
        <col style="width: 10%;">
        <col style="width: 10%;">
        <col style="width: 7%;">
        <col style="width: 7%;">
        <col style="width: 7%;">
        <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
    </colgroup>
    <thead>
        <tr>
            <th>Category</th>
            <th>Benchmark</th>
            <th>Llama-3.1-8B-Instruct</th>
            <th>Qwen2-7B-Instruct</th>
            <th>Qwen2.5-7B-Instruct</th>
            <th>Falcon3-7B-Instruct</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td rowspan="3">General</td>
            <td>MMLU (5-shot)</td>
            <td>-</td>
            <td>-</td>
            <td>-</td>
            <td>-</td>
        </tr>
        <tr>
            <td>MMLU-PRO (5-shot)</td>
            <td>-</td>
            <td>-</td>
            <td>-</td>
            <td>-</td>
        </tr>
        <tr>
            <td>IFEval</td>
            <td>-</td>
            <td>-</td>
            <td>-</td>
            <td>-</td>
        </tr>
        <tr>
            <td rowspan="2">Math</td>
            <td>GSM8K (5-shot)</td>
            <td>-</td>
            <td>-</td>
            <td>-</td>
            <td>-</td>
        </tr>
        <tr>
            <td>MATH(4-shot)</td>
            <td>-</td>
            <td>-</td>
            <td>-</td>
            <td>-</td>
        </tr>
        <tr>
            <td rowspan="4">Reasoning</td>
            <td>Arc Challenge (25-shot)</td>
            <td>-</td>
            <td>-</td>
            <td>-</td>
            <td>-</td>
        </tr>
        <tr>
            <td>GPQA (0-shot)</td>
            <td>-</td>
            <td>-</td>
            <td>-</td>
            <td>-</td>
        </tr>
        <tr>
            <td>MUSR (0-shot)</td>
            <td>-</td>
            <td>-</td>
            <td>-</td>
            <td>-</td>
        </tr>
        <tr>
            <td>BBH (3-shot)</td>
            <td>-</td>
            <td>-</td>
            <td>-</td>
            <td>-</td>
        </tr>
        <tr>
            <td rowspan="4">CommonSense Understanding</td>
            <td>PIQA (0-shot)</td>
            <td>-</td>
            <td>-</td>
            <td>-</td>
            <td>-</td>
        </tr>
        <tr>
            <td>SciQ (0-shot)</td>
            <td>-</td>
            <td>-</td>
            <td>-</td>
            <td>-</td>
        </tr>
        <tr>
            <td>Winogrande (0-shot)</td>
            <td>-</td>
            <td>-</td>
            <td>-</td>
            <td>-</td>
        </tr>
        <tr>
            <td>OpenbookQA (0-shot)</td>
            <td>-</td>
            <td>-</td>
            <td>-</td>
            <td>-</td>
        </tr>
    </tbody>
</table>


# Citation
If Falcon3 series were helpful to your work, feel free to give us a cite.

```
@misc{Falcon3,
    title = {Falcon 3 family of Open Foundation Models},
    author = {TII Team},
    month = {December},
    year = {2024}
}
```