---
license: llama2
language:
- zh
metrics:
- bleu
- chrf
---

Base model: https://huggingface.co/indiejoseph/cantonese-llama-2-7b-oasst-v1

Finetuned following ALMA (https://github.com/fe1ixxu/ALMA) on the Cantonese-Mandarin translation task.

Finetuning dataset: Sourced from the released raw dataset in https://github.com/meganndare/cantonese-nlp

As the base model was already finetuned on Cantonese monolingual data, we only conducted finetuning on parallel sentences.

Results:

Man -> Can: 35.371 BLEU, 26.197 ChrF++

Can -> Man: 36.553 BLEU, 27.471 ChrF++

Github Repo: https://github.com/cmgao/nlp_project

The ALMA code was linked as submodule.