metadata
language:
- en
tags:
- falcon3
Table of Contents
TL;DR
Model Details
Model Description
- Developed by: https://www.tii.ae
- Model type: Causal decoder-only
- Architecture: Transformer-base
- Language(s) (NLP): Mainly English
- License: TII Falcon-LLM License 2.0
Usage
Find below some example scripts on how to use the model in transformers
(Make sure to have the latest transformers, or the one built from source):
Using the Pytorch model with 🤗 transformers
Running the model on a CPU
Click to expand
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base")
model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base")
input_text = "Question: How many hours in one day? Answer: "
input_ids = tokenizer(input_text, return_tensors="pt").input_ids
outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
Running the model on a GPU
Click to expand
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base")
model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base", device_map="auto")
input_text = "Question: How many hours in one day? Answer: "
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
Running the model on a GPU using torch.compile
Click to expand
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base")
model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base", torch_dtype=torch.bfloat16).to(0)
model = torch.compile(model)
input_text = "Question: How many hours in one day? Answer: "
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
Training Details
Training Data
Training Procedure
Training Hyperparameters
Hyperparameter | Value | Comment |
---|---|---|
Precision | bfloat16 |
|
Optimizer | AdamW | |
Max learning rate | Following a WSD (warmup-stable-decay) learning rate schedule | |
Weight decay | ||
Batch size |
Evaluation
Metrics | Llama3.1-8B | Falcon3-7B-Base |
---|---|---|
MUSR | Row 1, Cell 2 | 18.70 |
BBH | Row 2, Cell 2 | 32.68 |
MMLU_PRO | Row 2, Cell 2 | 32.43 |
IF_EVAL | Row 2, Cell 2 | 34.27 |
GPQA | Row 2, Cell 2 | 13.97 |
MATH | Row 2, Cell 2 | 18.02 |
AVG | Row 2, Cell 2 | 24.85 |
Category | Benchmark | Llama3.1-8B | Qwen2-7B | Qwen2.5-7B | falcon{7}{Base} | Gemma2-9B | Yi1.5-9B | Mistral-NeMo-12B | falcon{10}{Base} |
---|---|---|---|---|---|---|---|---|---|
General | MMLU (5-shot) | 65.2 | 70.4 | 74.2 | 67.5 | 0 | 69.6 | 68.8 | 73.1 |
MMLU-PRO (5-shot) | 32.7 | 42.1 | 43.5 | 39.2 | 0 | 39.3 | 34.7 | 42.5 | |
IFEval | 12.0 | 30.6 | 33.9 | 34.3 | 0 | 29.1 | 16.1 | 36.4 |