|
--- |
|
license: llama2 |
|
datasets: |
|
- pkupie/mc2_corpus |
|
- togethercomputer/RedPajama-Data-1T |
|
language: |
|
- en |
|
- bo |
|
base_model: |
|
- meta-llama/Llama-2-7b-hf |
|
--- |
|
|
|
A continually pre-trained model based on Llama-2-7b-hf. |
|
|
|
We use the **Tibetan texts** in MC^2 and **English texts** in RedPajama with a proportion of **4:1** for training. |
|
|
|
#### Hyper-parameters: |
|
* lr: 3e-5 |
|
* batch size: 1M (2K*512) |
|
* lr scheduler: cosine |
|
* min lr: 1e-6 |
|
* lr decay iters: 10240 |
|
|
|
## Citation |
|
If you find this model is useful in your work, please cite it with: |
|
``` |
|
@inproceedings{tao-etal-2024-unlocking, |
|
title = "Unlocking the Potential of Model Merging for Low-Resource Languages", |
|
author = "Tao, Mingxu and |
|
Zhang, Chen and |
|
Huang, Quzhe and |
|
Ma, Tianyao and |
|
Huang, Songfang and |
|
Zhao, Dongyan and |
|
Feng, Yansong", |
|
editor = "Al-Onaizan, Yaser and |
|
Bansal, Mohit and |
|
Chen, Yun-Nung", |
|
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2024", |
|
month = nov, |
|
year = "2024", |
|
address = "Miami, Florida, USA", |
|
publisher = "Association for Computational Linguistics", |
|
url = "https://aclanthology.org/2024.findings-emnlp.508", |
|
doi = "10.18653/v1/2024.findings-emnlp.508", |
|
pages = "8705--8720" |
|
} |
|
``` |