File size: 3,252 Bytes
620f469 7154533 620f469 7154533 620f469 c4f32ad 9b64a07 7154533 620f469 7154533 620f469 d9ba105 d898e64 d9ba105 620f469 c4f32ad 620f469 e062ba6 8328633 620f469 c4bfff1 620f469 7154533 52f4664 620f469 bdd4a92 620f469 7154533 620f469 c4bfff1 620f469 c4bfff1 620f469 c4bfff1 620f469 c4bfff1 620f469 c4bfff1 620f469 c4bfff1 620f469 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 |
---
library_name: transformers
license: llama3
datasets:
- VTSNLP/vietnamese_curated_dataset
language:
- vi
- en
base_model:
- meta-llama/Meta-Llama-3-8B
pipeline_tag: text-generation
---
# Model Information
<!-- Provide a quick summary of what the model is/does. -->
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
Llama3-ViettelSolutions-8B is a variant of the Meta Llama-3-8B model, continued pre-trained on the [Vietnamese curated dataset](https://huggingface.co/datasets/VTSNLP/vietnamese_curated_dataset) and supervised fine-tuned on 5 million samples of Vietnamese instruct data.
- **Developed by:** Viettel Solutions
- **Funded by:** NVIDIA
- **Model type:** Autoregressive transformer model
- **Language(s) (NLP):** Vietnamese, English
- **License:** Llama 3 Community License
- **Finetuned from model:** meta-llama/Meta-Llama-3-8B
## Uses
Example snippet for usage with Transformers:
```
import transformers
import torch
model_id = "VTSNLP/Llama3-ViettelSolutions-8B"
pipeline = transformers.pipeline(
"text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto"
)
pipeline("Xin chào!")
```
## Training Details
### Training Data
- Dataset for continue pretrain: [Vietnamese curated dataset](https://huggingface.co/datasets/VTSNLP/vietnamese_curated_dataset)
- Dataset for supervised fine-tuning: [Instruct general dataset](https://huggingface.co/datasets/VTSNLP/instruct_general_dataset)
### Training Procedure
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
#### Preprocessing
[More Information Needed]
#### Training Hyperparameters
- **Training regime:** bf16 mixed precision <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
- **Data sequence length:** 8192
- **Tensor model parallel size:** 4
- **Pipelinemodel parallel size:** 1
- **Context parallel size:** 1
- **Micro batch size:** 1
- **Global batch size:** 512
## Evaluation
<!-- This section describes the evaluation protocols and provides the results. -->
### Testing Data, Factors & Metrics
#### Testing Data
<!-- This should link to a Dataset Card if possible. -->
[More Information Needed]
#### Factors
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
[More Information Needed]
#### Metrics
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
[More Information Needed]
### Results
[More Information Needed]
#### Summary
[More Information Needed]
## Technical Specifications
- Compute Infrastructure: NVIDIA DGX
- Hardware: 4 x A100 80GB
- Software: [NeMo Framework](https://github.com/NVIDIA/NeMo)
## Citation
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
**BibTeX:**
[More Information Needed]
**APA:**
[More Information Needed]
## More Information
[More Information Needed]
## Model Card Authors
[More Information Needed]
## Model Card Contact
[More Information Needed] |