nemotron-3-8b-chat-4k-steerlm / README.md

Update gating form

3c88111 about 1 year ago

9.03 kB

	---
	license: other
	license_name: nv-ai-foundation-models-license
	license_link: https://developer.nvidia.com/downloads/nv-ai-foundation-models-license
	library_name: nemo

	extra_gated_heading: Access Nemotron 3 8B on Hugging Face
	extra_gated_description: >-
	To download this model, you must agree to the terms of the [NVIDIA AI Foundation Models Community License Agreement](https://developer.nvidia.com/downloads/nv-ai-foundation-models-license).
	extra_gated_fields:
	I agree to share my name, email address and username with NVIDIA: checkbox
	geo: ip_location
	language:
	- "en"
	- "ar"
	- "az"
	- "bg"
	- "bn"
	- "ca"
	- "cs"
	- "da"
	- "de"
	- "el"
	- "es"
	- "et"
	- "fa"
	- "fi"
	- "fr"
	- "gl"
	- "he"
	- "hi"
	- "hr"
	- "hu"
	- "hy"
	- "id"
	- "is"
	- "it"
	- "ka"
	- "kk"
	- "kn"
	- "ko"
	- "lt"
	- "lv"
	- "mk"
	- "ml"
	- "mr"
	- "ne"
	- "nl"
	- "no"
	- "pl"
	- "pt"
	- "ro"
	- "ru"
	- "sk"
	- "sl"
	- "sq"
	- "sr"
	- "sv"
	- "ta"
	- "te"
	- "tr"
	- "uk"
	- "ur"
	- "vi"
	- "ja"
	- "zh"
	pipeline_tag: text-generation
	inference: false
	fine-tuning: true
	tags:
	- nvidia
	- nemotron-3
	- 8B
	---
	# Nemotron-3-8B-Chat-4k-SteerLM

	## Model Overview

	### License

	The use of this model is governed by the [NVIDIA AI Foundation Models Community License Agreement](https://developer.nvidia.com/downloads/nv-ai-foundation-models-license).


	### Description

	Nemotron-3-8B-SteerLM is an 8 billion parameter generative language model instruct-tuned on an 8B base model. It takes input with context length up to 4,096 tokens. The model has been customized using the [SteerLM method](https://arxiv.org/abs/2310.05344) developed by NVIDIA to allow for user control of model outputs during inference.

	Key capabilities enabled by SteerLM:

	- Dynamic steering of responses by specifying desired attributes like quality, helpfulness, and toxicity at inference time.
	- Simplified training compared to RLHF techniques like fine-tuning and bootstrapping.

	Nemotron-3-8B-SteerLM is part of Nemotron-3, which is a family of enterprise ready generative text models compatible with [NVIDIA NeMo Framework](https://www.nvidia.com/en-us/ai-data-science/generative-ai/nemo-framework/). For other models in this collection, see the [collections page](https://huggingface.co/collections/nvidia/nemotron-3-8b-6553adeb226f6ab4ffc356f9)

	NVIDIA NeMo is an end-to-end, cloud-native platform to build, customize, and deploy generative AI models anywhere. It includes training and inferencing frameworks, guardrailing toolkits, data curation tools, and pretrained models, offering enterprises an easy, cost-effective, and fast way to adopt generative AI. To get access to NeMo Framework, please sign up at [this link](https://developer.nvidia.com/nemo-framework/join).

	### References

	[Announcement Blog](https://developer.nvidia.com/blog/nvidia-ai-foundation-models-build-custom-enterprise-chatbots-and-co-pilots-with-production-ready-llms/)

	### Model Architecture

	Architecture Type: Transformer

	Network Architecture: Generative Pre-Trained Transformer (GPT-3)

	The SteerLM method involves the following key steps:

	1. Train an attribute prediction model on human annotated data to evaluate response quality.
	2. Use this model to annotate diverse datasets and enrich training data.
	3. Perform conditioned fine-tuning to align responses with specified combinations of attributes.
	4. (Optionally) Bootstrap training through model sampling and further fine-tuning.

	SteerLM-8B applies this technique on top of the open-source NVIDIA GPT model architecture. It was pretrained on internet-scale data and then customized using [OASST](https://huggingface.co/datasets/OpenAssistant/oasst1), [HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf), [Light](https://github.com/facebookresearch/ParlAI/blob/9974b947fb2e801dc5608f495828532c2a714742/parlai/tasks/light_dialog/build.py#L14), a subset of permissive licensed [OpenPlatypus](https://huggingface.co/datasets/garage-bAInd/Open-Platypus), and some internally collected SFT data.

	### Prompt Format

	#### Single Turn

	```text
	<extra_id_0>System
	A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.

	<extra_id_1>User
	{prompt 1}
	<extra_id_1>Assistant
	<extra_id_2>quality:4,understanding:4,correctness:4,coherence:4,complexity:4,verbosity:4,toxicity:0,humor:0,creativity:0,violence:0,helpfulness:4,not_appropriate:0,hate_speech:0,sexual_content:0,fails_task:0,political_content:0,moral_judgement:0,lang:en
	```

	#### Multi-Turn or Few-shot

	```text
	<extra_id_0>System
	A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.

	<extra_id_1>User
	{prompt 1}
	<extra_id_1>Assistant
	<extra_id_2>quality:4,understanding:4,correctness:4,coherence:4,complexity:4,verbosity:4,toxicity:0,humor:0,creativity:0,violence:0,helpfulness:4,not_appropriate:0,hate_speech:0,sexual_content:0,fails_task:0,political_content:0,moral_judgement:0,lang:en
	{response 1}
	<extra_id_1>User
	{prompt 2}
	<extra_id_1>Assistant
	<extra_id_2>quality:4,understanding:4,correctness:4,coherence:4,complexity:4,verbosity:4,toxicity:0,humor:0,creativity:0,violence:0,helpfulness:4,not_appropriate:0,hate_speech:0,sexual_content:0,fails_task:0,political_content:0,moral_judgement:0,lang:en
	```

	#### Example prompt formation code

	```python
	PROMPT_TEMPLATE = """<extra_id_0>System
	A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.

	<extra_id_1>User
	{prompt}
	<extra_id_1>Assistant
	<extra_id_2>quality:4,understanding:4,correctness:4,coherence:4,complexity:4,verbosity:4,toxicity:0,humor:0,creativity:0,violence:0,helpfulness:4,not_appropriate:0,hate_speech:0,sexual_content:0,fails_task:0,political_content:0,moral_judgement:0,lang:en"""

	question = "Write a poem on NVIDIA in the style of Shakespeare"
	prompt = PROMPT_TEMPLATE.format(prompt=question)
	print(prompt)
	```

	Each of the properties (e.g. humor, toxicity…) can receive integer values in the range `[0,4]`.

	### Software Integration

	Runtime Engine(s):
	NVIDIA AI Enterprise

	Toolkit:
	NeMo Framework

	To get access to NeMo Framework, please sign up at [this link](https://developer.nvidia.com/nemo-framework/join). See [NeMo inference container](https://registry.ngc.nvidia.com/orgs/ea-bignlp/teams/ga-participants/containers/nemofw-inference) documentation for details on how to setup and deploy an inference server with NeMo.

	Sample Inference Code:

	```python
	from nemo.deploy import NemoQuery

	# In this case, we run inference on the same machine
	nq = NemoQuery(url="localhost:8000", model_name="Nemotron-3-8B-Chat-4K-RLHF")

	# See above for prompt format
	output = nq.query_llm(prompts=[prompt], max_output_token=200, top_k=1, top_p=0.0, temperature=0.1)

	# NOTE: Chat models require post-processing the output since the `NemoQuery` API
	# does not support stopping generation on the special <extra_id_1> token.
	output = [[s.split("<extra_id_1>", 1)[0].strip() for s in out] for out in output]

	print(output)
	```

	Supported Hardware:

	- H100
	- A100 80GB, A100 40GB

	### Model Version(s)

	`Nemotron-3-8B-chat-4k-steerlm-BF16-1`

	## Dataset

	NVIDIA models are trained on a diverse set of public and proprietary datasets. NVIDIA is committed to the responsible development of large language models and conducts reviews of all datasets included in training.

	## Evaluation

	MT Bench Score
	\| Category \| Score \|
	\|---------------------\|------------------\|
	\| Total \| 5.6 \|
	\| Writing \| 6.35 \|
	\| Roleplay \| 6.9 \|
	\| Extraction \| 5.25 \|
	\| Stem \| 7.5 \|
	\| Humanities \| 9.02 \|
	\| Reasoning \| 4.9 \|
	\| Math \| 2.0 \|
	\| Coding \| 2.9 \|

	## Intended use

	The 8B-Chat-SteerLM model is for users who want to customize a model’s response during inference.

	### Ethical use

	Technology can have a profound impact on people and the world, and NVIDIA is committed to enabling trust and transparency in AI development. NVIDIA encourages users to adopt principles of AI ethics and trustworthiness to guide your business decisions by following the guidelines in the [NVIDIA AI Foundation Models Community License Agreement](https://developer.nvidia.com/downloads/nv-ai-foundation-models-license).

	## Limitations

	- The model was trained on data that contains toxic language and societal biases originally crawled from the internet. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts.
	- The model may generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive.