metadata

language:
  - en
tags:
  - falcon3

TL;DR
Model Details
Usage
Training Details
Evaluation

TL;DR

Model Details

Model Description

Developed by: https://www.tii.ae
Model type: Causal decoder-only
Architecture: Transformer-base
Language(s) (NLP): Mainly English
License: TII Falcon-LLM License 2.0

Usage

Find below some example scripts on how to use the model in transformers (Make sure to have the latest transformers, or the one built from source):

Using the Pytorch model with 🤗 transformers

Running the model on a CPU

Click to expand

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base")
model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base")

input_text = "Question: How many hours in one day? Answer: "
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

Running the model on a GPU

Click to expand

# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base")
model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base", device_map="auto")

input_text = "Question: How many hours in one day? Answer: "
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

Running the model on a GPU using `torch.compile`

Click to expand

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base")
model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base", torch_dtype=torch.bfloat16).to(0)

model = torch.compile(model)

input_text = "Question: How many hours in one day? Answer: "
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

Training Details

Training Data

Training Procedure

Training Hyperparameters

Hyperparameter	Value	Comment
Precision	`bfloat16`
Optimizer	AdamW
Max learning rate		Following a WSD (warmup-stable-decay) learning rate schedule
Weight decay
Batch size

Evaluation

Category	Benchmark	Llama3.1-8B	Qwen2-7B	Qwen2.5-7B	Falcon3-7B-Base	Gemma2-9B	Yi1.5-9B	Mistral-NeMo-12B	Falcon3-10B-Base
General	MMLU (5-shot)	65.2	70.4	74.2	67.5	0	69.6	68.8	73.1
	MMLU-PRO (5-shot)	32.7	42.1	43.5	39.2	0	39.3	34.7	42.5
	IFEval	12.0	30.6	33.9	34.3	0	29.1	16.1	36.4
Math	GSM8K (5-shot)	49.4	77.9	82.9	76.2	69.1	63.8	55.3	81.4
Math	MATH(4-shot)	4.1	17.5	15.5	18.0	0	9.2	4.9	22.9
Reasoning	Arc Challenge (25-shot)	53.4	57.4	59.0	59.6	63.7	58.2	60.6	62.6
	GPQA (0-shot)	31.0	31.9	33.0	35.5	0	36.6	28.8	34.1
	MUSR (0-shot)	38.0	44.1	44.2	47.3	0	43.3	39.2	44.2
	BBH (3-shot)	46.5	53.3	54.0	51.0	0	51.3	50.2	59.7
CommonSense Understanding	PIQA (0-shot)	80.3	79.8	78.7	77.7	81.4	79.8	81.4	79.1
	SciQ (0-shot)	96.3	95.9	96.6	95.3	97.2	95.8	96.4	96.0
	Winogrande (0-shot)	74.0	72.1	72.9	71.0	74.2	72.7	73.2	73.6
	OpenbookQA (0-shot)	33.4	35.2	33.6	31.4	34.0	35.4	36.4	34.0

tiiuae
/

Falcon3-7B-Base

Table of Contents

TL;DR

Model Details

Model Description

Usage

Using the Pytorch model with 🤗 transformers

Running the model on a CPU

Running the model on a GPU

Running the model on a GPU using `torch.compile`

Training Details

Training Data

Training Procedure

Training Hyperparameters

Evaluation

Citation

Table of Contents

TL;DR

Model Details

Model Description

Usage

Using the Pytorch model with 🤗 transformers

Running the model on a CPU

Running the model on a GPU

Running the model on a GPU using torch.compile

Training Details

Training Data

Training Procedure

Training Hyperparameters

Evaluation

Citation

Running the model on a GPU using `torch.compile`