aloobun
/

ReMask-135m

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

ReMask-135m / README.md

aloobun's picture

Update README.md

ce01c83 verified 2 months ago

|

history blame contribute delete

842 Bytes

	---
	license: apache-2.0
	library_name: transformers
	---

	i wanted to learn more about exposure bias mitigation in language models and came across [ReMask](https://huggingface.co/euclaise/ReMask-3B).
	it's a neat idea, and i wanted to give it a go.

	- during training, the model processes input sequences twice - once with the full sequence & once with masked sequence.
	- computes model outputs for both.
	- divergence loss is computed as the average of forward and backward KL divergences.
	- final loss is a weighted sum of the cross entropy losses and the divergence loss.

	impl on github

	```
	<\|user\|>
	Could Moulin Rouge have been hypothetically used as Spain's Spanish American War triage center?
	<\|logic\|>
	The Moulin Rouge cabaret in France had a capacity of 850 people. Spain had 700-800 injured during Spanish American War.
	<\|answer\|>
	```