superaidesu/cantonese-alma-2-7b-oasst-v1-lora

Finetuned following ALMA (https://github.com/fe1ixxu/ALMA) on the Cantonese-Mandarin translation task.

Finetuning dataset: Sourced from the released raw dataset in https://github.com/meganndare/cantonese-nlp

As the base model was already finetuned on Cantonese monolingual data, we only conducted finetuning on parallel sentences.

Results:

Man -> Can: 35.371 BLEU, 26.197 ChrF++

Can -> Man: 36.553 BLEU, 27.471 ChrF++

The ALMA code was linked as submodule.