cmarkea
/

bloomz-3b-dpo-chat

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Cyrile commited on Jul 5

Commit

28765ca

•

1 Parent(s): 92b2f25

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -40,7 +40,7 @@ essential.
 The bloomz-3b-dpo-chat model was trained using the [Anthropic/hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf) dataset, which includes:
-**Human Preference Data:**
    - **Description:** Annotations of helpfulness and harmlessness, with each entry containing "chosen" and "rejected" text pairs.
    - **Purpose:** To train preference models for Reinforcement Learning from Human Feedback (RLHF), not for supervised training of dialogue agents.
    - **Source:** Data from context-distilled language models, rejection sampling, and an iterated online process.

 The bloomz-3b-dpo-chat model was trained using the [Anthropic/hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf) dataset, which includes:
+Human Preference Data:
    - **Description:** Annotations of helpfulness and harmlessness, with each entry containing "chosen" and "rejected" text pairs.
    - **Purpose:** To train preference models for Reinforcement Learning from Human Feedback (RLHF), not for supervised training of dialogue agents.
    - **Source:** Data from context-distilled language models, rejection sampling, and an iterated online process.