ufal
/

File size: 7,547 Bytes
8f4977d
 
b7e870b
 
347ee9f
 
 
 
 
 
 
 
 
 
8f4977d
c8e6238
b7e870b
 
 
c8e6238
 
 
 
b7e870b
 
 
 
 
 
 
 
347ee9f
b7e870b
c8e6238
 
 
 
 
 
 
 
b7e870b
 
 
 
 
c8e6238
 
b7e870b
 
 
c8e6238
b7e870b
c8e6238
b7e870b
347ee9f
 
 
b7e870b
 
 
c8e6238
b7e870b
c8e6238
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2f5b17a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c8e6238
 
 
 
b7e870b
 
c8e6238
 
 
 
 
 
 
 
 
 
347ee9f
c8e6238
 
 
 
 
 
 
 
 
 
347ee9f
 
c8e6238
 
 
 
347ee9f
c8e6238
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
347ee9f
c8e6238
 
b7e870b
 
 
 
 
347ee9f
b7e870b
 
347ee9f
b7e870b
 
 
 
 
 
 
 
 
c8e6238
347ee9f
 
 
 
 
c8e6238
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
---
license: llama2
language:
- en
datasets:
- McGill-NLP/stereoset
- wino_bias
- wikitext
- allenai/ai2_arc
- allenai/openbookqa
- cais/mmlu
metrics:
- perplexity
- accuracy
---
# DAMA 

<!-- Provide a quick summary of what the model is/does. -->

## Model 

LLaMA model adapted to mitigate gender bias in text generation.
For adaptation, we used **D**ebiasing **A**lgorithm through **M**odel **A**daptation (DAMA) method described in [Limisiewicz et al., 2024](https://openreview.net/pdf?id=XIZEFyVGC9).

### Model Description

<!-- Provide a longer summary of what this model is. -->



- **Developed by:** Tomasz Limisiewicz, David Mareček, Tomáš Musil
- **Funded by:** Grant Agency of Czech Republic
- **Language(s) (NLP):** English
- **Adapted from model:** LLaMA

### Model Sizes

- **[7B](https://huggingface.co/ufal/DAMA-7B)**
- **[13B](https://huggingface.co/ufal/DAMA-13B)**
- **[33B](https://huggingface.co/ufal/DAMA-33B)**
- **[65B](https://huggingface.co/ufal/DAMA-65B)**

### Model Sources

<!-- Provide the basic links for the model. -->

- **[Repository](github.com/tomlimi/DAMA)** 
- **[Paper](openreview.net/pdf?id=XIZEFyVGC9)**



## Bias, Risks, and Limitations

<!-- This section is meant to convey both technical and sociotechnical limitations. -->

DAMA mitigates the gender bias of the original model. 
It is better suited for generating and processing texts in sensitive domains, such as hiring, social services, or professional counseling.
Still, we recommend caution for such use cases because bias is not entirely erased (the same as in any other currently available method).



## Adaptation

<!-- Include image. -->

![Dama Schema](DamaSchema.png)

Schema (b) shows DAMA intervention in a LLaMA layer.
Even though `I - P_c` is depicted as a separate module, in practice, it is multiplied with the output matrix of a feed-forward layer (`W_FF`).
Therefore, DAMA is neutral to the model's parameter count and architecture.
(a) We show the behavior of the model when presented with a stereotypical prompt.
Specifically, (c) shows the projections of the feed-forward latent vector (`u`) onto the output space.
With DAMA (lower arrow), we nullify the gender component of the representation. 
It results in balanced probabilities of gendered tokens in the model's output, as shown in (d).

The method for obtaining `P_c` is based on the Partial Least Square algorithm.
For more details, please refer to the [paper](https://openreview.net/pdf?id=XIZEFyVGC9).

## Use

Following snippet shows the basic usage od DAMA for text generation.

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

DAMA_SIZE= '7B'
OUTPUT_DIR = 'output'
model = AutoModelForCausalLM.from_pretrained(f"ufal/DAMA-{DAMA_SIZE}", offload_folder=OUTPUT_DIR,
                                            torch_dtype=torch.float16, low_cpu_mem_usage=True, 
                                            device_map='auto')

tokenizer = AutoTokenizer.from_pretrained(f"ufal/DAMA-{DAMA_SIZE}", use_fast=True, return_token_type_ids=False)

prompt = "The lifeguard laughed because"
inputs = tokenizer(prompt, return_tensors="pt")

generate_ids = model.generate(inputs.input_ids, max_length=30)
tokenizer.batch_decode(generate_ids, skip_special_tokens=True)[0]
```

## Evaluation

We evaluate the models on multiple benchmarks to assess gender bias and language understanding capabilities.
DAMA models are compared with the original LLaMA models.


### Bias Evaluation

We introduced a metric for evaluating gender bias in text generation.
It measures to which extent the models' output is affected by stereotypical `a_s` and factual `a_f` gender signals.

Moreover, we provide the scores for two established bias benchmarks: **WinoBias** and **Stereoset**.

### Results


||  Bias in LM ||| WinoBias  ||| Stereoset |||    
|--------------------------------------------------------------------|--------|-------|--------|--------|-----------|-----------|------|-----------|------|
|                                                                    | `a_s`  | `a_f` | `b`    | Acc    | `Delta S` | `Delta G` | lms  | ss        | ICAT |
| LLaMA 7B                                                           | 0.235  | 0.320 | 0.072  | 59.1\% | 40.3\%    | 3.0\%     | 95.5 | 71.9      | 53.7 |
| DAMA 7B                                                            | -0.005 | 0.038 | -0.006 | 57.3\% | 31.5\%    | 2.3\%     | 95.5 | 69.3      | 58.5 |
| LLaMA 13B                                                          | 0.270  | 0.351 | 0.070  | 70.5\% | 35.7\%    | -1.5\%    | 95.2 | 71.4      | 54.4 |
| DAMA 13B                                                           | 0.148  | 0.222 | 0.059  | 66.4\% | 31.1\%    | -1.1\%    | 94.4 | 68.6      | 59.4 |
| LLaMA 33B                                                          | 0.265  | 0.343 | 0.092  | 71.0\% | 36.0\%    | -4.0\%    | 94.7 | 68.4      | 59.9 |
| DAMA  33B                                                          | 0.105  | 0.172 | 0.059  | 63.7\% | 26.7\%    | -3.7\%    | 94.8 | 65.7      | 65.0 |
| LLaMA 65B                                                          | 0.249  | 0.316 | 0.095  | 73.3\% | 35.7\%    | 1.4\%     | 94.9 | 69.5      | 57.9 |
| DAMA  65B                                                          | 0.185  | 0.251 | 0.100  | 71.1\% | 27.2\%    | 0.8\%     | 92.8 | 67.1      | 61.1 |

Bias evaluation for the LLaMA models and their debiased instances.


### Performance Evaluation

To check the effect of debiasing on LM capabilities, we compute perplexity on **Wikipedia corpus**.
We also test performance on four language understanding end-tasks: **OpenBookQA**, **AI2 Reasoning Challenge** (Easy and Chalange Sets), and **Massive Multitask Language Understanding**.


### Results

|           | Perpelexity    | ARC-C | ARC-E |OBQA  | MMLU  |
|-----------|----------------|----------------|-----------|-----------------|-------|
| LLaMA 7B  | 26.1       | 42.2           |69.1       | 57.2            | 30.3  |
| DAMA 7B   | 28.9           | 41.8           | 68.3      | 56.2            | 30.8  |
| LLaMA 13B | 19.8           | 44.9           | 70.6      | 55.4            | 43.3  |
| DAMA 13B  | 21.0           | 44.7           | 70.3      | 56.2            | 43.5  |
| LLaMA 33B | 20.5           | 47.4 | 72.9      | 59.2            | 55.7* |
| DAMA 33B  | 19.6           | 45.2           | 71.6      | 58.2            | 56.1* |
| LLaMA 65B | 19.5 | 44.5           | 73.9 | 59.6 | ---*  |
| DAMA 65B  | 20.1           | 40.5           | 67.7      | 57.2            | --- * |

Performance evaluation for the LLaMA models and their debiased instances.
Due to hardware limitations, we could not run MMLU inference for 65B models.
In the evaluation of 33B model, we excluded 4\% longest prompts.

## Citation

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->


**BibTeX:**

```bibtex
@inproceedings{
limisiewicz2024debiasing,
title={Debiasing Algorithm through Model Adaptation},
author={Tomasz Limisiewicz and David Mare{\v{c}}ek and Tom{\'a}{\v{s}} Musil},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=XIZEFyVGC9}
}
```

**APA:**

Limisiewicz, T., Mareček, D., & Musil, T. (2024). Debiasing Algorithm through Model Adaptation. The Twelfth International Conference on Learning Representations.


## Model Card Author

[Tomasz Limisiewicz](mailto:limisewicz@ufal.mff.cuni.cz)