metadata

language:
  - en
tags:
  - falcon3

TL;DR
Model Details
Usage
Training Details
Evaluation

TL;DR

Model Details

Model Description

Developed by: https://www.tii.ae
Model type: Causal decoder-only
Architecture: Transformer-base
Language(s) (NLP): Mainly English
License: TII Falcon-LLM License 2.0

Usage

Find below some example scripts on how to use the model in transformers (Make sure to have the latest transformers, or the one built from source):

Using the Pytorch model with 🤗 transformers

Running the model on a CPU

Click to expand

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base")
model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base")

input_text = "Question: How many hours in one day? Answer: "
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

Running the model on a GPU

Click to expand

# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base")
model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base", device_map="auto")

input_text = "Question: How many hours in one day? Answer: "
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

Running the model on a GPU using `torch.compile`

Click to expand

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base")
model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base", torch_dtype=torch.bfloat16).to(0)

model = torch.compile(model)

input_text = "Question: How many hours in one day? Answer: "
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

Training Details

Training Data

Training Procedure

Training Hyperparameters

Hyperparameter	Value	Comment
Precision	`bfloat16`
Optimizer	AdamW
Max learning rate		Following a WSD (warmup-stable-decay) learning rate schedule
Weight decay
Batch size

Evaluation

Metrics	Llama3.1-8B	Falcon3-7B-Base
MUSR	Row 1, Cell 2	18.70
BBH	Row 2, Cell 2	32.68
MMLU_PRO	Row 2, Cell 2	32.43
IF_EVAL	Row 2, Cell 2	34.27
GPQA	Row 2, Cell 2	13.97
MATH	Row 2, Cell 2	18.02
AVG	Row 2, Cell 2	24.85

Category	Benchmark	Llama3.1-8B	Qwen2-7B	Qwen2.5-7B	falcon{7}{Base}	Yi1.5-9B	Mistral-NeMo-12B	falcon{10}{Base}
General	MMLU (5-shot)	65.2	70.4	74.2	67.5	69.6	68.8	73.1
	MMLU-PRO (5-shot)	32.7	42.1	43.5	39.2	39.3	34.7	42.5
	IFEval	12.0	30.6	33.9	34.3	29.1	16.1	36.4

tiiuae
/

Falcon3-7B-Base

Table of Contents

TL;DR

Model Details

Model Description

Usage

Using the Pytorch model with 🤗 transformers

Running the model on a CPU

Running the model on a GPU

Running the model on a GPU using `torch.compile`

Training Details

Training Data

Training Procedure

Training Hyperparameters

Evaluation

Citation

Table of Contents

TL;DR

Model Details

Model Description

Usage

Using the Pytorch model with 🤗 transformers

Running the model on a CPU

Running the model on a GPU

Running the model on a GPU using torch.compile

Training Details

Training Data

Training Procedure

Training Hyperparameters

Evaluation

Citation

Running the model on a GPU using `torch.compile`