mapama247 commited on
Commit
91108cd
1 Parent(s): 2c7c2f8

update readme with model details, intended use, hw and sw

Browse files
Files changed (1) hide show
  1. README.md +132 -3
README.md CHANGED
@@ -1,3 +1,132 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ pipeline_tag: text-generation
5
+ language:
6
+ - bg
7
+ - ca
8
+ - code
9
+ - cs
10
+ - cy
11
+ - da
12
+ - de
13
+ - el
14
+ - en
15
+ - es
16
+ - et
17
+ - eu
18
+ - fi
19
+ - fr
20
+ - ga
21
+ - gl
22
+ - hr
23
+ - hu
24
+ - it
25
+ - lt
26
+ - lv
27
+ - mt
28
+ - nl
29
+ - nn
30
+ - no
31
+ - oc
32
+ - pl
33
+ - pt
34
+ - ro
35
+ - ru
36
+ - sh
37
+ - sk
38
+ - sl
39
+ - sr
40
+ - sv
41
+ - uk
42
+ ---
43
+
44
+ ![](./images/salamandra_header.png)
45
+
46
+ # Salamandra Model Card
47
+
48
+ Salamandra comes in three different sizes — 2B, 7B and 40B parameters — with their respective base and instruction-tuned variants.
49
+ This model card corresponds to the 7B version.
50
+
51
+ To visit the model cards of other Salamandra versions, please refer to the [Model Index](#model-index).
52
+
53
+ The entire Salamandra family is released under a permissive [Apache 2.0 license]((https://www.apache.org/licenses/LICENSE-2.0)), allowing both research and commercial use.
54
+ Along with the open weights, all training scripts and configuration files are made publicly available in [this GitHub repository](https://github.com/projecte-aina/salamandra).
55
+
56
+ ---
57
+
58
+ ## Model Details
59
+
60
+ ### Description
61
+
62
+ Transformer-based decoder-only language model that has been pre-trained on 7.5 trillion tokens of highly curated data.
63
+ The pre-training corpus contains text in 35 European languages and code.
64
+
65
+ ### Hyperparameters
66
+
67
+ The full list of hyperparameters for each model can be found [here](https://github.com/projecte-aina/salamandra/tree/main/configs).
68
+
69
+ ### Architecture
70
+
71
+ | | |
72
+ |-------------------------|:--------------|
73
+ | Total Parameters | 7,768,117,248 |
74
+ | Embedding Parameters | 1,048,576,000 |
75
+ | Layers | 32 |
76
+ | Hidden size | 4,096 |
77
+ | Attention heads | 32 |
78
+ | Context length | 8,192 |
79
+ | Vocabulary size | 256,000 |
80
+ | Precision | bfloat16 |
81
+ | Embedding type | RoPE |
82
+ | Activation Function | SwiGLU |
83
+ | Layer normalization | RMS Norm |
84
+ | Flash attention | ✅ |
85
+ | Grouped Query Attention | ✅ |
86
+
87
+ ---
88
+
89
+ ## Intended Use
90
+
91
+ ### Direct Use
92
+
93
+ The models are intended for both research and commercial use in any of the languages included in the training data.
94
+ The base models are intended either for language generation or to be further fine-tuned for specific use-cases.
95
+ The instruction-tuned variants can be used as general-purpose assistants, as long as the user is fully aware of the model’s limitations.
96
+
97
+ ### Out-of-scope Use
98
+
99
+ The model is not intended for malicious activities, such as harming others or violating human rights.
100
+ Any downstream application must comply with current laws and regulations.
101
+ Irresponsible usage in production environments without proper risk assessment and mitigation is also discouraged.
102
+
103
+ ---
104
+
105
+ ## Hardware and Software
106
+
107
+ ### Training Framework
108
+
109
+ Pre-training was conducted using NVIDIA’s [NeMo Framework](https://docs.nvidia.com/nemo-framework/index.html),
110
+ which leverages PyTorch Lightning for efficient model training in highly distributed settings.
111
+
112
+ The instruction-tuned versions were produced with [FastChat](https://github.com/lm-sys/FastChat).
113
+
114
+ ### Compute Infrastructure
115
+
116
+ All models were trained on [MareNostrum 5](https://www.bsc.es/ca/marenostrum/marenostrum-5), a pre-exascale EuroHPC supercomputer hosted and
117
+ operated by Barcelona Supercomputing Center.
118
+
119
+ The accelerated partition is composed of 1,120 nodes with the following specifications:
120
+ - 4x Nvidia Hopper GPUs with 64 HBM2 memory
121
+ - 2x Intel Sapphire Rapids 8460Y+ at 2.3Ghz and 32c each (64 cores)
122
+ - 4x NDR200 (BW per node 800Gb/s)
123
+ - 512 GB of Main memory (DDR5)
124
+ - 460GB on NVMe storage
125
+
126
+ |Model|Nodes|GPUs|
127
+ |:---:|:---:|:---:|
128
+ |2B|64|256|
129
+ |7B|128|512|
130
+ |40B|256 / 512|1,024 / 2,048|
131
+
132
+ ---