--- language: - en tags: - falcon3 --- # Table of Contents 0. [TL;DR](#TL;DR) 1. [Model Details](#model-details) 2. [Usage](#usage) 3. [Training Details](#training-details) 4. [Evaluation](#evaluation) # TL;DR # Model Details ## Model Description - **Developed by:** [https://www.tii.ae](https://www.tii.ae) - **Model type:** Causal decoder-only - **Architecture:** Transformer-base - **Language(s) (NLP):** Mainly English - **License:** TII Falcon-LLM License 2.0
# Usage Find below some example scripts on how to use the model in `transformers` (Make sure to have the latest transformers, or the one built from source): ## Using the Pytorch model with 🤗 transformers ### Running the model on a CPU

Click to expand

```python from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base") model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base") input_text = "Question: How many hours in one day? Answer: " input_ids = tokenizer(input_text, return_tensors="pt").input_ids outputs = model.generate(input_ids) print(tokenizer.decode(outputs[0])) ```

### Running the model on a GPU

Click to expand

```python # pip install accelerate from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base") model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base", device_map="auto") input_text = "Question: How many hours in one day? Answer: " input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda") outputs = model.generate(input_ids) print(tokenizer.decode(outputs[0])) ```

### Running the model on a GPU using `torch.compile`

Click to expand

```python import torch from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base") model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base", torch_dtype=torch.bfloat16).to(0) model = torch.compile(model) input_text = "Question: How many hours in one day? Answer: " input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda") outputs = model.generate(input_ids) print(tokenizer.decode(outputs[0])) ```

# Training Details ## Training Data ## Training Procedure ### Training Hyperparameters | **Hyperparameter** | **Value** | **Comment** | |--------------------|------------|-------------------------------------------| | Precision | `bfloat16` | | | Optimizer | AdamW | | | Max learning rate | | Following a WSD (warmup-stable-decay) learning rate schedule | | Weight decay | | | | Batch size | | | # Evaluation

Metrics	Llama3.1-8B	Falcon3-7B-Base
MUSR	Row 1, Cell 2	18.70
BBH	Row 2, Cell 2	32.68
MMLU_PRO	Row 2, Cell 2	32.43
IF_EVAL	Row 2, Cell 2	34.27
GPQA	Row 2, Cell 2	13.97
MATH	Row 2, Cell 2	18.02
AVG	Row 2, Cell 2	24.85

Category	Benchmark	Llama3.1-8B	Qwen2-7B	Qwen2.5-7B	falcon{7}{Base}	Yi1.5-9B	Mistral-NeMo-12B	falcon{10}{Base}
General	MMLU (5-shot)	65.2	70.4	74.2	67.5	69.6	68.8	73.1
	MMLU-PRO (5-shot)	32.7	42.1	43.5	39.2	39.3	34.7	42.5
	IFEval	12.0	30.6	33.9	34.3	29.1	16.1	36.4

# Citation