metadata

license: llama2
library_name: nemo
language:
  - en
pipeline_tag: text-generation
inference: false
fine-tuning: true
tags:
  - nvidia
  - steerlm
  - llama2
datasets:
  - nvidia/HelpSteer
  - OpenAssistant/oasst1

Llama2-13B-SteerLM-RM

License

The use of this model is governed by the Llama 2 Community License Agreement

Description:

Llama2-13B-SteerLM-RM is a 13 billion parameter language model (with context of up to 4,096 tokens) used as the Reward Model/Attribute Prediction Model in training Llama2-70B-SteerLM-Chat

Given a conversation with multiple turns between user and assistant, it rates the following attributes (between 0 and 4) for every assistant turn.

Quality: Perceived goodness of response
Toxicity: Undesirable elements such as vulgar, harmful or potentially biased response
Humor: Sense of humor within response
Creativity: Willingness to generate non-conventional response
Helpfulness: Overall helpfulness of the response to the prompt.
Correctness: Inclusion of all pertinent facts without errors.
Coherence: Consistency and clarity of expression.
Complexity: Intellectual depth required to write response (i.e. whether the response can be written by anyone with basic language competency or requires deep domain expertise).
Verbosity: Amount of detail included in the response, relative to what is asked for in the prompt.

The first four attrubutes are taken from the Open Assistant dataset while the others are taken from HelpSteer dataset

HelpSteer Paper : HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM

SteerLM Paper: SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF

Llama2-13B-SteerLM-RM is trained with NVIDIA NeMo, an end-to-end, cloud-native framework to build, customize, and deploy generative AI models anywhere. It includes training and inferencing frameworks, guardrailing toolkits, data curation tools, and pretrained models, offering enterprises an easy, cost-effective, and fast way to adopt generative AI.

Usage:

You can use the model with NeMo Aligner following SteerLM training user guide.

This model can be useful to train a model like Llama2-70B-SteerLM-Chat or annotate the attributes for any conversation.

Spin up an inference server within the NeMo Aligner container

python /opt/NeMo-Aligner/examples/nlp/gpt/serve_reward_model.py \
      rm_model_file=Llama2-13B-SteerLM-RM.nemo \
      trainer.num_nodes=1 \
      trainer.devices=8 \
      ++model.tensor_model_parallel_size=4 \
      ++model.pipeline_model_parallel_size=1 \
      inference.micro_batch_size=2 \
      inference.port=1424

Annotate data files using the served reward model. If you are seeking to reproduce training of Llama2-70B-SteerLM-Chat, this will be the Open Assistant train/val files. Then follow the next step to train a SteerLM model based on SteerLM training user guide .

python /opt/NeMo-Aligner/examples/nlp/data/steerlm/preprocess_openassistant_data.py --output_directory=data/oasst

python /opt/NeMo-Aligner/examples/nlp/data/steerlm/attribute_annotate.py \
      --input-file=data/oasst/train.jsonl \
      --output-file=data/oasst/train_labeled.jsonl \
      --port=1424

Alternatively, this can be any conversational data file (in .jsonl) in the following format, where each line looks like

{
    "conversations": [
              {"value": <user_turn_1>, "from": "User", "label": None},
              {"value": <assistant_turn_1>, "from": "Assistant", "label": <formatted_label_1>},
              {"value": <user_turn_2>, "from": "User", "label": None},
              {"value": <assistant_turn_2>, "from": "Assistant", "label": <formatted_label_2>},
          ],
    "mask": "User"
}

Ideally, each <formatted_label_n> refers to the ground truth label for the assistant turn but if they are not available, we can also use quality:4,toxicity:0,humor:0,creativity:0,helpfulness:4,correctness:4,coherence:4,complexity:4,verbosity:4

Contact

E-Mail: Zhilin Wang

Citation

If you find this dataset useful, please cite the following works

@misc{wang2023helpsteer,
      title={HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM}, 
      author={Zhilin Wang and Yi Dong and Jiaqi Zeng and Virginia Adams and Makesh Narsimhan Sreedhar and Daniel Egert and Olivier Delalleau and Jane Polak Scowcroft and Neel Kant and Aidan Swope and Oleksii Kuchaiev},
      year={2023},
      eprint={2311.09528},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

@misc{dong2023steerlm,
      title={SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF}, 
      author={Yi Dong and Zhilin Wang and Makesh Narsimhan Sreedhar and Xianchao Wu and Oleksii Kuchaiev},
      year={2023},
      eprint={2310.05344},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}