|
--- |
|
license: apache-2.0 |
|
library_name: transformers |
|
--- |
|
|
|
i wanted to learn more about exposure bias mitigation in language models and came across [ReMask](https://huggingface.co/euclaise/ReMask-3B). |
|
it's a neat idea, and i wanted to give it a go. |
|
|
|
- during training, the model processes input sequences twice - once with the full sequence & once with masked sequence. |
|
- computes model outputs for both. |
|
- divergence loss is computed as the average of forward and backward KL divergences. |
|
- final loss is a weighted sum of the cross entropy losses and the divergence loss. |
|
|
|
impl on github |
|
|
|
``` |
|
<|user|> |
|
Could Moulin Rouge have been hypothetically used as Spain's Spanish American War triage center? |
|
<|logic|> |
|
The Moulin Rouge cabaret in France had a capacity of 850 people. Spain had 700-800 injured during Spanish American War. |
|
<|answer|> |
|
``` |