File size: 6,057 Bytes
0042fd8 daadb6a ef1523a daadb6a 0042fd8 daadb6a a8fd6a9 03664b5 daadb6a 67fdf19 9ba1a9e 67fdf19 fa817c1 daadb6a a152b1d daadb6a 69955e1 daadb6a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
---
license: llama2
library_name: nemo
language:
- en
pipeline_tag: text-generation
inference: false
fine-tuning: true
tags:
- nvidia
- steerlm
- llama2
- reward model
datasets:
- nvidia/HelpSteer
- OpenAssistant/oasst1
---
# Llama2-13B-SteerLM-RM
## License
The use of this model is governed by the [Llama 2 Community License Agreement](https://ai.meta.com/llama/license/)
## Description:
Llama2-13B-SteerLM-RM is a 13 billion parameter language model (with context of up to 4,096 tokens) used as the Attribute Prediction Model in training [Llama2-70B-SteerLM-Chat](https://huggingface.co/nvidia/Llama2-70B-SteerLM-Chat)
Attribute Prediction Model is a multi-aspect Reward Model that rates model responses on various aspects that makes a response desirable instead of a singular score in a conventional Reward Model.
Given a conversation with multiple turns between user and assistant, it rates the following attributes (between 0 and 4) for every assistant turn.
1. **Quality**: Perceived goodness of response
2. **Toxicity**: Undesirable elements such as vulgar, harmful or potentially biased response
3. **Humor**: Sense of humor within response
4. **Creativity**: Willingness to generate non-conventional response
5. **Helpfulness**: Overall helpfulness of the response to the prompt.
6. **Correctness**: Inclusion of all pertinent facts without errors.
7. **Coherence**: Consistency and clarity of expression.
8. **Complexity**: Intellectual depth required to write response (i.e. whether the response can be written by anyone with basic language competency or requires deep domain expertise).
9. **Verbosity**: Amount of detail included in the response, relative to what is asked for in the prompt.
The first four attributes are taken from the [Open Assistant](https://huggingface.co/datasets/OpenAssistant/oasst1) dataset while the others are taken from [HelpSteer](https://huggingface.co/datasets/nvidia/HelpSteer) dataset
HelpSteer Paper : [HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM](http://arxiv.org/abs/2311.09528)
SteerLM Paper: [SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF](https://arxiv.org/abs/2310.05344)
Llama2-13B-SteerLM-RM is trained with NVIDIA [NeMo-Aligner](https://github.com/NVIDIA/NeMo-Aligner), a scalable toolkit for performant and efficient model alignment. NeMo-Aligner is built using the [NeMo Framework](https://github.com/NVIDIA/NeMo) which allows for scaling training up to 1000s of GPUs using tensor, data and pipeline parallelism for all components of alignment. All of our checkpoints are cross compatible with the NeMo ecosystem, allowing for inference deployment and further customization.
## Usage:
You can use the model with [NeMo Aligner](https://github.com/NVIDIA/NeMo-Aligner) following [SteerLM training user guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/modelalignment/steerlm.html).
This model can be useful to train a model like [Llama2-70B-SteerLM-Chat](https://huggingface.co/nvidia/Llama2-70B-SteerLM-Chat) or annotate the attributes for any conversation.
1. Spin up an inference server within the [NeMo Aligner container](https://github.com/NVIDIA/NeMo-Aligner/blob/main/Dockerfile)
```python
python /opt/NeMo-Aligner/examples/nlp/gpt/serve_reward_model.py \
rm_model_file=Llama2-13B-SteerLM-RM.nemo \
trainer.num_nodes=1 \
trainer.devices=8 \
++model.tensor_model_parallel_size=4 \
++model.pipeline_model_parallel_size=1 \
inference.micro_batch_size=2 \
inference.port=1424
```
2. Annotate data files using the served reward model. If you are seeking to reproduce training of [Llama2-70B-SteerLM-Chat](https://huggingface.co/nvidia/Llama2-70B-SteerLM-Chat), this will be the Open Assistant train/val files. Then follow the next step to train a SteerLM model based on [SteerLM training user guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/modelalignment/steerlm.html#step-5-train-the-attribute-conditioned-sft-model) .
```python
python /opt/NeMo-Aligner/examples/nlp/data/steerlm/preprocess_openassistant_data.py --output_directory=data/oasst
python /opt/NeMo-Aligner/examples/nlp/data/steerlm/attribute_annotate.py \
--input-file=data/oasst/train.jsonl \
--output-file=data/oasst/train_labeled.jsonl \
--port=1424
```
3. Alternatively, this can be any conversational data file (in .jsonl) in the following format, where each line looks like
```json
{
"conversations": [
{"value": <user_turn_1>, "from": "User", "label": None},
{"value": <assistant_turn_1>, "from": "Assistant", "label": <formatted_label_1>},
{"value": <user_turn_2>, "from": "User", "label": None},
{"value": <assistant_turn_2>, "from": "Assistant", "label": <formatted_label_2>},
],
"mask": "User"
}
```
Ideally, each ```<formatted_label_n>``` refers to the ground truth label for the assistant turn but if they are not available, we can also use ```quality:4,toxicity:0,humor:0,creativity:0,helpfulness:4,correctness:4,coherence:4,complexity:4,verbosity:4```
## Contact
E-Mail: [Zhilin Wang](mailto:zhilinw@nvidia.com)
## Citation
If you find this dataset useful, please cite the following works
```bibtex
@misc{wang2023helpsteer,
title={HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM},
author={Zhilin Wang and Yi Dong and Jiaqi Zeng and Virginia Adams and Makesh Narsimhan Sreedhar and Daniel Egert and Olivier Delalleau and Jane Polak Scowcroft and Neel Kant and Aidan Swope and Oleksii Kuchaiev},
year={2023},
eprint={2311.09528},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
```bibtex
@misc{dong2023steerlm,
title={SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF},
author={Yi Dong and Zhilin Wang and Makesh Narsimhan Sreedhar and Xianchao Wu and Oleksii Kuchaiev},
year={2023},
eprint={2310.05344},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
|