pablo-rf commited on
Commit
8413d77
·
verified ·
1 Parent(s): a4393a0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -118,7 +118,7 @@ library_name: transformers
118
  ## Model description
119
 
120
  **Llama-3.1-Carballo** is a 8B-parameter transformer-based causal language model for Galician, Portuguese, Spanish, Catalan and English.
121
- It is the result of a continual pretraining of [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) with a multilingual corpus of almost 20B tokens, with an emphasis of Galician texts.
122
 
123
  This model is part of the **Carballo familily**, a family of LLMs specialized in Galician. Smaller models can be founded [here](https://huggingface.co/collections/proxectonos/text-models-65d49fa54e358ce02a9699c8)
124
  ## Intended uses and limitations
@@ -164,7 +164,7 @@ It was trained using HuggingFace Transformers and Pytorch, using the [Causal Mod
164
  ### Training data
165
 
166
 
167
- The training corpus consists of texts in 5 languages, with an emphasis on Galician. The main aim of this is to ensure that the model learns to handle the latter language perfectly, while maintaining knowledge of languages already known (Spanish, English), learning others (Catalan) or adapting existing language varieties (Portuguese-PT instead of Portuguese-BR).
168
 
169
  The corpus is structured as follows:
170
 
@@ -196,7 +196,7 @@ The corpus is structured as follows:
196
  The traininf was conducted in the Galicia Supercomputing Center ([CESGA](https://www.cesga.es/en/home-2/)), using 5 nodes with 2 GPUs NVIDIA A100 each one.
197
 
198
  ## Evaluation
199
- TO-DO
200
 
201
  ## Additional information
202
 
 
118
  ## Model description
119
 
120
  **Llama-3.1-Carballo** is a 8B-parameter transformer-based causal language model for Galician, Portuguese, Spanish, Catalan and English.
121
+ It is the result of a continual pretraining of [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) with a multilingual corpus of almost 20B tokens, with an emphasis on Galician texts.
122
 
123
  This model is part of the **Carballo familily**, a family of LLMs specialized in Galician. Smaller models can be founded [here](https://huggingface.co/collections/proxectonos/text-models-65d49fa54e358ce02a9699c8)
124
  ## Intended uses and limitations
 
164
  ### Training data
165
 
166
 
167
+ The training corpus consists of texts in 5 languages, with an emphasis on Galician. The main aim of this is to ensure that the model learns to work with this language perfectly, while maintaining knowledge of languages already known (Spanish, English), learning others (Catalan) or adapting existing language varieties (Portuguese-PT instead of Portuguese-BR).
168
 
169
  The corpus is structured as follows:
170
 
 
196
  The traininf was conducted in the Galicia Supercomputing Center ([CESGA](https://www.cesga.es/en/home-2/)), using 5 nodes with 2 GPUs NVIDIA A100 each one.
197
 
198
  ## Evaluation
199
+ In process...
200
 
201
  ## Additional information
202