tenyx
/

TenyxChat-8x7B-v1

Text Generation

tenyx-fine-tuning

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Romain-Cosentino commited on Jan 18

Commit

3e0f2fa

•

1 Parent(s): 2b705ee

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -10,7 +10,7 @@ tags:
 ---
 # TenyxChat: Language Model Alignment using Tenyx Fine-tuning
-Introducing TenyxChat-8x7B-v1, part of our TenyxChat serie trained to function as useful assistants through preference tuning, using Tenyx's recently released advanced fine-tuning technology ([VentureBeat article](https://venturebeat.com/ai/tenyx-aims-to-fix-llms-catastrophic-forgetting-problem/)). Our model is trained using the [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290) framework on the open-source AI feedback dataset [UltraFeedback](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized).
 We fine-tune [Mixtral-8x7B-Instruct-v0.1](https://arxiv.org/pdf/2401.04088.pdf) with our proprietary approach ([blog](https://www.tenyx.com/post/forgetting-and-toxicity-in-llms-a-deep-dive-on-fine-tuning-methods) already applied to obtain TenyxChat-7B-v1 (https://huggingface.co/tenyx/TenyxChat-7B-v1), [service](https://www.tenyx.com/fine-tuning)), which shows an increase in [MT-Bench](https://arxiv.org/abs/2306.05685). Our approach aims to mitigate forgetting in LLMs in a computationally efficient manner, thereby enabling continual fine-tuning capabilities without altering the pre-trained output distribution. TenyxChat-8x7B-v1 was trained using eight A100s (80GB) for about eight hours, with a training setup obtained from HuggingFaceH4 ([GitHub](https://github.com/huggingface/alignment-handbook)).

 ---
 # TenyxChat: Language Model Alignment using Tenyx Fine-tuning
+Introducing TenyxChat-8x7B-v1, part of our TenyxChat series trained to function as useful assistants through preference tuning, using Tenyx's recently released advanced fine-tuning technology ([VentureBeat article](https://venturebeat.com/ai/tenyx-aims-to-fix-llms-catastrophic-forgetting-problem/)). Our model is trained using the [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290) framework on the open-source AI feedback dataset [UltraFeedback](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized).
 We fine-tune [Mixtral-8x7B-Instruct-v0.1](https://arxiv.org/pdf/2401.04088.pdf) with our proprietary approach ([blog](https://www.tenyx.com/post/forgetting-and-toxicity-in-llms-a-deep-dive-on-fine-tuning-methods) already applied to obtain TenyxChat-7B-v1 (https://huggingface.co/tenyx/TenyxChat-7B-v1), [service](https://www.tenyx.com/fine-tuning)), which shows an increase in [MT-Bench](https://arxiv.org/abs/2306.05685). Our approach aims to mitigate forgetting in LLMs in a computationally efficient manner, thereby enabling continual fine-tuning capabilities without altering the pre-trained output distribution. TenyxChat-8x7B-v1 was trained using eight A100s (80GB) for about eight hours, with a training setup obtained from HuggingFaceH4 ([GitHub](https://github.com/huggingface/alignment-handbook)).