Full SFT training caused lose its foundational capabilities

#71
by sinlew - opened

After using the Transformers 4.43.3 model for SFT training, the MMLU score dropped from the original model's 67 points to 22 points. Why did this happen? With the same data, the llama3-8-instruct model only dropped to 46 points.

can you share the code link or some dataset link.

Please provide more context please.

https://github.com/hiyouga/LLaMA-Factory/issues/5047
It's train code.
The dataset consists of chat logs from different individuals, totaling over 10,000 entries. Each chat log contains approximately 30-80 rounds formatted in the SUAUA pattern. The conversations are primarily casual and do not involve much specialized knowledge.

I noticed that you have been fine-tuned before. Does your model have this issue?

I noticed that you have been fine-tuned before. Does your model have this issue?

Yes I'm also facing the same issue with the llama3 & llama3.1 I used the unsloth approach. But I didn't does the Full SFT. I did with very minimal data.

Note: I'm also raised the same kind of question check here please.
https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct/discussions/78
https://discuss.huggingface.co/t/my-adapter-model-dominating-the-entire-base-model/100577

Can we connect privatly if possible lets discuss through Gmeet or zoom.

https://discuss.huggingface.co/t/my-adapter-model-dominating-the-entire-base-model/100577

have a look on above url, in my case I used the standard format its working fine comparatively previous responses

I replaced and optimized the training data to solve this problem. Simple dialogue fine-tuning on Llama 3.1 can significantly disrupt the foundational capabilities of the model. After increasing dialogue diversity, the model at least maintained some of its foundational abilities. However, the fine-tuned model still does not meet the expected capabilities.

@sinlew in my cause i used the lamama 3.1 & 3.2 both are working fine I used 10k dataset for the finetuning all working good. I want to know your lora settings and number of epoch you are using.

@antony-pk I use 6 epoch. Have you run any evaluation tools, such as MMLU or MMLU Pro? Has there been a significant drop in scores?

For your information im not using the any MMLU or MMLU Pro, I used my own custom script

Sign up or log in to comment