Triangle104 commited on
Commit
1436cc3
·
verified ·
1 Parent(s): ca79ac7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -0
README.md CHANGED
@@ -109,6 +109,32 @@ model-index:
109
  This model was converted to GGUF format from [`tiiuae/Falcon3-10B-Instruct`](https://huggingface.co/tiiuae/Falcon3-10B-Instruct) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
110
  Refer to the [original model card](https://huggingface.co/tiiuae/Falcon3-10B-Instruct) for more details on the model.
111
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
112
  ## Use with llama.cpp
113
  Install llama.cpp through brew (works on Mac and Linux)
114
 
 
109
  This model was converted to GGUF format from [`tiiuae/Falcon3-10B-Instruct`](https://huggingface.co/tiiuae/Falcon3-10B-Instruct) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
110
  Refer to the [original model card](https://huggingface.co/tiiuae/Falcon3-10B-Instruct) for more details on the model.
111
 
112
+ ---
113
+ Model details:
114
+ -
115
+ Falcon3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B parameters.
116
+
117
+ This repository contains the Falcon3-10B-Instruct. It achieves state-of-the-art results (at the time of release) on reasoning, language understanding, instruction following, code and mathematics tasks. Falcon3-10B-Instruct supports 4 languages (English, French, Spanish, Portuguese) and a context length of up to 32K.
118
+
119
+ Details
120
+ -
121
+ Architecture
122
+ Transformer-based causal decoder-only architecture
123
+ 40 decoder blocks
124
+ Grouped Query Attention (GQA) for faster inference: 12 query heads and 4 key-value heads
125
+ Wider head dimension: 256
126
+ High RoPE value to support long context understanding: 1000042
127
+ Uses SwiGLu and RMSNorm
128
+ 32K context length
129
+ 131K vocab size
130
+ Depth up-scaled from Falcon3-7B-Base with 2 Teratokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using 1024 H100 GPU chips
131
+ Posttrained on 1.2 million samples of STEM, conversational, code, safety and function call data
132
+ Supports EN, FR, ES, PT
133
+ Developed by Technology Innovation Institute
134
+ License: TII Falcon-LLM License 2.0
135
+ Model Release Date: December 2024
136
+
137
+ ---
138
  ## Use with llama.cpp
139
  Install llama.cpp through brew (works on Mac and Linux)
140