docs(readme.md): init readme

2b69b8f about 1 month ago

8.07 kB

	---
	language:
	- en
	tags:
	- falcon3
	---


	# Table of Contents

	0. [TL;DR](#TL;DR)
	1. [Model Details](#model-details)
	2. [Usage](#usage)
	3. [Training Details](#training-details)
	4. [Evaluation](#evaluation)


	# TL;DR
	Falcon 3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.

	This repository contains the Falcon3-7B-Instruct, the best Instruct LLM under 8B at the time of release.

	# Model Details

	## Model Description

	- Developed by: [https://www.tii.ae](https://www.tii.ae)
	- Model type: Causal decoder-only
	- Architecture: Transformer-base
	- Language(s) (NLP): Mainly English
	- License: TII Falcon-LLM License 2.0

	<br>

	# Usage

	Find below an example on how to use the model in `transformers` (Make sure to have the latest transformers, or the one built from source):

	<details>
	<summary> Click to expand </summary>

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM


	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "tiiuae/Falcon3-7B-Instruct"

	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype="auto",
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained(model_name)

	prompt = "How many hours in one day?"
	messages = [
	{"role": "system", "content": "You are a helpful friendly assistant Falcon3 from TII, try to follow instructions as much as possible."},
	{"role": "user", "content": prompt}
	]
	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)
	model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

	generated_ids = model.generate(
	**model_inputs,
	max_new_tokens=1024
	)
	generated_ids = [
	output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
	]

	response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
	print(response)
	```

	</details>


	# Training Details
	Based on `tiiuae/Falcon3-7B-Base`, post-training stage is comprised of supervised finetuning followed by human preference alignement (DPO).

	## Supervised finetuning
	### Training Data
	1.2 million diverse, high-quality samples Tulu-3, Open-Hermes, Numina an Apigen.

	\| Data type \| ratio \|
	\|--------------------------------------\|-------\|
	\| Conversations \| 32% \|
	\| STEM \| 32% \|
	\| Code \| 12% \|
	\| Safety \| 9.1% \|
	\| Multi lingual \| 8.3% \|
	\| Function call \| 3.3% \|
	\| NLP (summarization, generation, QA) \| 3.2% \|

	#### Training Hyperparameters

	<style type="text/css">
	.tg {border-collapse:collapse;border-spacing:0;}
	.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
	overflow:hidden;padding:10px 5px;word-break:normal;}
	.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
	font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
	.tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:top}
	.tg .tg-7btt{border-color:inherit;font-weight:bold;text-align:center;vertical-align:top}
	.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
	.tg .tg-ihkz{border-color:inherit;text-align:center;vertical-align:top}
	.tg .tg-pcvp{border-color:inherit;text-align:left;vertical-align:top}
	.tg .tg-j2vi{border-color:inherit;font-weight:bold;text-align:center;vertical-align:top}
	.tg .tg-amwm{border-color:inherit;text-align:left;vertical-align:top}
	.tg .tg-0lax{border-color:inherit;font-weight:bold;text-align:center;vertical-align:top}
	</style>
	<table class="tg"><thead>
	<tr>
	<th class="tg-7btt" rowspan="3">AdamW</th>
	<th class="tg-c3ow">β1</th>
	<th class="tg-0pky">0.9</th>
	</tr>
	<tr>
	<th class="tg-ihkz">β2</th>
	<th class="tg-pcvp">0.999</th>
	</tr>
	<tr>
	<th class="tg-c3ow">weight decay</th>
	<th class="tg-0pky">0.01</th>
	</tr></thead>
	<tbody>
	<tr>
	<td class="tg-j2vi" rowspan="4">Learning rate</td>
	<td class="tg-ihkz">type</td>
	<td class="tg-pcvp">linear decay</td>
	</tr>
	<tr>
	<td class="tg-c3ow">init lr</td>
	<td class="tg-0pky">5e-6</td>
	</tr>
	<tr>
	<td class="tg-ihkz">final lr</td>
	<td class="tg-pcvp">0</td>
	</tr>
	<tr>
	<td class="tg-c3ow">warm rate</td>
	<td class="tg-0pky">0.03</td>
	</tr>
	<tr>
	<td class="tg-j2vi">Batch size</td>
	<td class="tg-ihkz"></td>
	<td class="tg-pcvp">64</td>
	</tr>
	<tr>
	<td class="tg-amwm">Epochs</td>
	<td class="tg-0lax"></td>
	<td class="tg-0lax">2</td>
	</tr>
	</tbody>
	</table>

	## Human preference alignment - DPO

	### Training Data
	TO DO DO DO DO

	#### Training Hyperparameters
	TODODODODOD


	# Evaluation
	We report in the following table our internal pipeline benchmarks:


	<table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
	<colgroup>
	<col style="width: 10%;">
	<col style="width: 10%;">
	<col style="width: 7%;">
	<col style="width: 7%;">
	<col style="width: 7%;">
	<col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
	</colgroup>
	<thead>
	<tr>
	<th>Category</th>
	<th>Benchmark</th>
	<th>Llama-3.1-8B-Instruct</th>
	<th>Qwen2-7B-Instruct</th>
	<th>Qwen2.5-7B-Instruct</th>
	<th>Falcon3-7B-Instruct</th>
	</tr>
	</thead>
	<tbody>
	<tr>
	<td rowspan="3">General</td>
	<td>MMLU (5-shot)</td>
	<td>-</td>
	<td>-</td>
	<td>-</td>
	<td>-</td>
	</tr>
	<tr>
	<td>MMLU-PRO (5-shot)</td>
	<td>-</td>
	<td>-</td>
	<td>-</td>
	<td>-</td>
	</tr>
	<tr>
	<td>IFEval</td>
	<td>-</td>
	<td>-</td>
	<td>-</td>
	<td>-</td>
	</tr>
	<tr>
	<td rowspan="2">Math</td>
	<td>GSM8K (5-shot)</td>
	<td>-</td>
	<td>-</td>
	<td>-</td>
	<td>-</td>
	</tr>
	<tr>
	<td>MATH(4-shot)</td>
	<td>-</td>
	<td>-</td>
	<td>-</td>
	<td>-</td>
	</tr>
	<tr>
	<td rowspan="4">Reasoning</td>
	<td>Arc Challenge (25-shot)</td>
	<td>-</td>
	<td>-</td>
	<td>-</td>
	<td>-</td>
	</tr>
	<tr>
	<td>GPQA (0-shot)</td>
	<td>-</td>
	<td>-</td>
	<td>-</td>
	<td>-</td>
	</tr>
	<tr>
	<td>MUSR (0-shot)</td>
	<td>-</td>
	<td>-</td>
	<td>-</td>
	<td>-</td>
	</tr>
	<tr>
	<td>BBH (3-shot)</td>
	<td>-</td>
	<td>-</td>
	<td>-</td>
	<td>-</td>
	</tr>
	<tr>
	<td rowspan="4">CommonSense Understanding</td>
	<td>PIQA (0-shot)</td>
	<td>-</td>
	<td>-</td>
	<td>-</td>
	<td>-</td>
	</tr>
	<tr>
	<td>SciQ (0-shot)</td>
	<td>-</td>
	<td>-</td>
	<td>-</td>
	<td>-</td>
	</tr>
	<tr>
	<td>Winogrande (0-shot)</td>
	<td>-</td>
	<td>-</td>
	<td>-</td>
	<td>-</td>
	</tr>
	<tr>
	<td>OpenbookQA (0-shot)</td>
	<td>-</td>
	<td>-</td>
	<td>-</td>
	<td>-</td>
	</tr>
	</tbody>
	</table>


	# Citation
	If Falcon3 series were helpful to your work, feel free to give us a cite.

	```
	@misc{Falcon3,
	title = {Falcon 3 family of Open Foundation Models},
	author = {TII Team},
	month = {December},
	year = {2024}
	}
	```