metadata

language:
  - en
tags:
  - falcon3

TL;DR
Model Details
Usage
Training Details
Evaluation

TL;DR

Falcon 3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.

This repository contains the Falcon3-7B-Instruct, the best Instruct LLM under 8B at the time of release.

Model Details

Model Description

Developed by: https://www.tii.ae
Model type: Causal decoder-only
Architecture: Transformer-base
Language(s) (NLP): Mainly English
License: TII Falcon-LLM License 2.0

Usage

Find below an example on how to use the model in transformers (Make sure to have the latest transformers, or the one built from source):

Click to expand

from transformers import AutoTokenizer, AutoModelForCausalLM


from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "tiiuae/Falcon3-7B-Instruct"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "How many hours in one day?"
messages = [
    {"role": "system", "content": "You are a helpful friendly assistant Falcon3 from TII, try to follow instructions as much as possible."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=1024
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Training Details

Based on tiiuae/Falcon3-7B-Base, post-training stage is comprised of supervised finetuning followed by human preference alignement (DPO).

Supervised finetuning

Training Data

1.2 million diverse, high-quality samples Tulu-3, Open-Hermes, Numina an Apigen.

Data type	ratio
Conversations	32%
STEM	32%
Code	12%
Safety	9.1%
Multi lingual	8.3%
Function call	3.3%
NLP (summarization, generation, QA)	3.2%

Training Hyperparameters

AdamW	β1	0.9
	β2	0.999
	weight decay	0.01
Learning rate	type	linear decay
	init lr	5e-6
	final lr	0
	warm rate	0.03
Batch size		64
Epochs		2

Human preference alignment - DPO

Training Data

TO DO DO DO DO

Training Hyperparameters

TODODODODOD

Evaluation

We report in the following table our internal pipeline benchmarks:

Category	Benchmark	Llama-3.1-8B-Instruct	Qwen2-7B-Instruct	Qwen2.5-7B-Instruct	Falcon3-7B-Instruct
General	MMLU (5-shot)	-	-	-	-
	MMLU-PRO (5-shot)	-	-	-	-
	IFEval	-	-	-	-
Math	GSM8K (5-shot)	-	-	-	-
Math	MATH(4-shot)	-	-	-	-
Reasoning	Arc Challenge (25-shot)	-	-	-	-
	GPQA (0-shot)	-	-	-	-
	MUSR (0-shot)	-	-	-	-
	BBH (3-shot)	-	-	-	-
CommonSense Understanding	PIQA (0-shot)	-	-	-	-
	SciQ (0-shot)	-	-	-	-
	Winogrande (0-shot)	-	-	-	-
	OpenbookQA (0-shot)	-	-	-	-

Citation

If Falcon3 series were helpful to your work, feel free to give us a cite.

@misc{Falcon3,
    title = {Falcon 3 family of Open Foundation Models},
    author = {TII Team},
    month = {December},
    year = {2024}
}

tiiuae
/

Falcon3-7B-Instruct

Table of Contents

TL;DR

Model Details

Model Description

Usage

Training Details

Supervised finetuning

Training Data

Training Hyperparameters

Human preference alignment - DPO

Training Data

Training Hyperparameters

Evaluation

Citation