File size: 3,252 Bytes
620f469
 
7154533
 
 
 
 
 
 
 
 
620f469
 
7154533
620f469
 
 
 
 
 
 
 
 
 
 
c4f32ad
9b64a07
7154533
 
 
 
 
620f469
 
 
7154533
620f469
d9ba105
 
 
 
d898e64
d9ba105
 
 
 
 
 
620f469
 
 
 
 
 
c4f32ad
620f469
e062ba6
8328633
620f469
 
 
 
 
c4bfff1
620f469
 
 
 
 
 
7154533
52f4664
 
 
 
 
 
620f469
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bdd4a92
620f469
7154533
620f469
c4bfff1
620f469
c4bfff1
620f469
c4bfff1
620f469
c4bfff1
620f469
 
 
 
 
 
 
 
 
 
 
c4bfff1
620f469
 
 
c4bfff1
620f469
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
---
library_name: transformers
license: llama3
datasets:
- VTSNLP/vietnamese_curated_dataset
language:
- vi
- en
base_model:
- meta-llama/Meta-Llama-3-8B
pipeline_tag: text-generation
---

# Model Information

<!-- Provide a quick summary of what the model is/does. -->



## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

Llama3-ViettelSolutions-8B is a variant of the Meta Llama-3-8B model, continued pre-trained on the [Vietnamese curated dataset](https://huggingface.co/datasets/VTSNLP/vietnamese_curated_dataset) and supervised fine-tuned on 5 million samples of Vietnamese instruct data.
- **Developed by:** Viettel Solutions
- **Funded by:** NVIDIA
- **Model type:** Autoregressive transformer model
- **Language(s) (NLP):** Vietnamese, English
- **License:** Llama 3 Community License
- **Finetuned from model:** meta-llama/Meta-Llama-3-8B

## Uses

Example snippet for usage with Transformers:

```
import transformers
import torch

model_id = "VTSNLP/Llama3-ViettelSolutions-8B"

pipeline = transformers.pipeline(
    "text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto"
)
pipeline("Xin chào!")
```


## Training Details

### Training Data

- Dataset for continue pretrain: [Vietnamese curated dataset](https://huggingface.co/datasets/VTSNLP/vietnamese_curated_dataset)

- Dataset for supervised fine-tuning: [Instruct general dataset](https://huggingface.co/datasets/VTSNLP/instruct_general_dataset)


### Training Procedure

<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

#### Preprocessing

[More Information Needed]


#### Training Hyperparameters

- **Training regime:** bf16 mixed precision <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
- **Data sequence length:** 8192
- **Tensor model parallel size:** 4
- **Pipelinemodel parallel size:** 1
- **Context parallel size:** 1
- **Micro batch size:** 1
- **Global batch size:** 512

## Evaluation

<!-- This section describes the evaluation protocols and provides the results. -->

### Testing Data, Factors & Metrics

#### Testing Data

<!-- This should link to a Dataset Card if possible. -->

[More Information Needed]

#### Factors

<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->

[More Information Needed]

#### Metrics

<!-- These are the evaluation metrics being used, ideally with a description of why. -->

[More Information Needed]

### Results

[More Information Needed]

#### Summary

[More Information Needed]

## Technical Specifications

- Compute Infrastructure: NVIDIA DGX 

- Hardware: 4 x A100 80GB

- Software: [NeMo Framework](https://github.com/NVIDIA/NeMo)

## Citation 

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

**BibTeX:**

[More Information Needed]

**APA:**

[More Information Needed]

## More Information

[More Information Needed]

## Model Card Authors

[More Information Needed]

## Model Card Contact

[More Information Needed]