--- language: - en tags: - falcon3 --- # Table of Contents 0. [TL;DR](#TL;DR) 1. [Model Details](#model-details) 2. [Usage](#usage) 3. [Training Details](#training-details) 4. [Evaluation](#evaluation) # TL;DR # Model Details ## Model Description - **Developed by:** [https://www.tii.ae](https://www.tii.ae) - **Model type:** Causal decoder-only - **Architecture:** Transformer-base - **Language(s) (NLP):** Mainly English - **License:** TII Falcon-LLM License 2.0
# Usage Find below some example scripts on how to use the model in `transformers` (Make sure to have the latest transformers, or the one built from source): ## Using the Pytorch model with 🤗 transformers ### Running the model on a CPU
Click to expand ```python from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base") model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base") input_text = "Question: How many hours in one day? Answer: " input_ids = tokenizer(input_text, return_tensors="pt").input_ids outputs = model.generate(input_ids) print(tokenizer.decode(outputs[0])) ```
### Running the model on a GPU
Click to expand ```python # pip install accelerate from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base") model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base", device_map="auto") input_text = "Question: How many hours in one day? Answer: " input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda") outputs = model.generate(input_ids) print(tokenizer.decode(outputs[0])) ```
### Running the model on a GPU using `torch.compile`
Click to expand ```python import torch from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base") model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base", torch_dtype=torch.bfloat16).to(0) model = torch.compile(model) input_text = "Question: How many hours in one day? Answer: " input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda") outputs = model.generate(input_ids) print(tokenizer.decode(outputs[0])) ```
# Training Details ## Training Data ## Training Procedure ### Training Hyperparameters | **Hyperparameter** | **Value** | **Comment** | |--------------------|------------|-------------------------------------------| | Precision | `bfloat16` | | | Optimizer | AdamW | | | Max learning rate | | Following a WSD (warmup-stable-decay) learning rate schedule | | Weight decay | | | | Batch size | | | # Evaluation
Metrics Llama3.1-8B Falcon3-7B-Base
MUSR Row 1, Cell 2 18.70
BBH Row 2, Cell 2 32.68
MMLU_PRO Row 2, Cell 2 32.43
IF_EVAL Row 2, Cell 2 34.27
GPQA Row 2, Cell 2 13.97
MATH Row 2, Cell 2 18.02
AVG Row 2, Cell 2 24.85
Category Benchmark Llama3.1-8B Qwen2-7B Qwen2.5-7B falcon{7}{Base} Gemma2-9B Yi1.5-9B Mistral-NeMo-12B falcon{10}{Base}
General MMLU (5-shot) 65.2 70.4 74.2 67.5 0 69.6 68.8 73.1
MMLU-PRO (5-shot) 32.7 42.1 43.5 39.2 0 39.3 34.7 42.5
IFEval 12.0 30.6 33.9 34.3 0 29.1 16.1 36.4
# Citation