PyTorch
English
llama
instruct
values
ethics
Edit model card

WiseLLama-8B

WiseLLama-8B is a LLaMa-3.1-8B-Instruct derived model, fine-tuned on an explicit representation of values. This model aims to provide more nuanced and helpful responses to harmful, heavy, or exploratory questions.

A live demo is available here.

Model Details

  • Base Model: LLaMa-3.1-8B-Instruct
  • Training Technique: Fine-tuned using SFT (Supervised Fine-Tuning) and DPO (Direct Preference Optimization)
  • Training Data: Synthetically created dataset of values-laden conversations
  • Model Type: Causal language model
  • Language(s): English
  • Developer: Meaning Alignment Institute

Intended Use

WiseLLama-8B is designed to provide thoughtful responses to a wide range of user queries, including those that might be considered harmful, heavy, or exploratory. The model aims to meet users where they're at and provide meaningful guidance based on an explicit representation of values.

Training Procedure

WiseLLama-8B was trained on a synthetically created dataset of values-laden conversations. The training process involved:

  1. Sourcing and generating user questions of the following types:

    • Harmful questions
    • Heavy questions
    • Exploratory questions
  2. Using a prompt chain to reason about the user's situation and identify relevant "attention policies" (constitutive considerations important to attend to in that situation).

  3. Generating responses that take this moral reasoning into account.

  4. Training the model to intersperse the values used in its responses using special <value> tags.

Data and Code

The datasets used to train this model are available on Hugging Face:

The code used to generate the training data is available on GitHub:

Wise Dataset Generation Code

Value Tags

WiseLLama-8B uses special <value> tags to indicate parts of its response that are inspired by specific values. These tags are made up of special tokens in the model's vocabulary. They are formatted as follows:

<value choice-type="[situation]" consideration="[attention policy]">[inspired text]</value>

For example:

<value choice-type="forbidden thrills" consideration="**FEELINGS** of being fully alive and present in the moment">Engaging in extreme sports can provide an intense rush of adrenaline and excitement</value>

These tags provide transparency into the model's decision-making process and the values it considers when generating responses.

Limitations

  • The model's understanding of values is based on synthetic data and may not perfectly align with real-world ethical considerations.
  • As with all language models, WiseLLama-8B may produce biased or inconsistent outputs.
  • The model's knowledge is limited to its training data and cutoff date.

How to Use

WiseLLama-8B can be used just like any other LLaMa-like transformer model on Hugging Face with their libraries. Here's a basic example of how to use the model with the Transformers library:

from transformers import AutoTokenizer, AutoModelForCausalLM

# Load the model and tokenizer
model_name = "meaningalignment/wisellama-8b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Prepare your input
input_text = "What are some healthy ways to deal with anger?"

# Tokenize the input
inputs = tokenizer(input_text, return_tensors="pt")

# Generate a response
outputs = model.generate(inputs.input_ids, max_length=200)

# Decode and print the response
response = tokenizer.decode(outputs[0], skip_special_tokens=False)
print(response)

Note that the response may contain <value> tags. You can choose to display these tags to show the model's reasoning process, or you can parse and remove them for a cleaner output.

To use the model with specific configurations or for more advanced use cases, refer to the Hugging Face Transformers documentation.

Citation

If you use this model in your research or application, please cite it as follows:

@software{wisellama_8b,
  author = {Edelman, Joe and Klingefjord, Oliver},
  title = {WiseLLama-8B},
  year = {2024},
  publisher = {Meaning Alignment Institute},
  url = {https://huggingface.co/meaningalignment/wisellama-8b}
}

Note: While there is no accompanying paper for this model, we encourage users to acknowledge the authors and the Meaning Alignment Institute in their work.

Contact

For questions and comments about WiseLLama-8B, please contact:

Email: hello@meaningalignment.org

Downloads last month
10
Inference API
Unable to determine this model's library. Check the docs .

Model tree for meaningalignment/wise-llama

Finetuned
(420)
this model

Datasets used to train meaningalignment/wise-llama