Use this model to generate variations to augment the training data used for NLU systems.

from transformers import AutoTokenizer, AutoModelWithLMHead 

import torch
if torch.cuda.is_available():
    device = torch.device("cuda")
else :
    device = "cpu"

tokenizer = AutoTokenizer.from_pretrained("Ashishkr/Gpt2-paraphrase_generation")  
model = AutoModelWithLMHead.from_pretrained("Ashishkr/Gpt2-paraphrase_generationn").to(device)

input_query="every moment is a fresh beginning"
query= input_query + " ~~ "

input_ids = tokenizer.encode(query.lower(), return_tensors='pt').to(device)
sample_outputs = model.generate(input_ids,
                                do_sample=True,
                                num_beams=1, 
                                max_length=128,
                                temperature=0.9,
                                top_p= 0.99,
                                top_k = 30,
                                num_return_sequences=40)
paraphrases = []
for i in range(len(sample_outputs)):
    r = tokenizer.decode(sample_outputs[i], skip_special_tokens=True).split('||')[0]
    r = r.split(' ~~ ')[1]
    if r not in paraphrases:
        paraphrases.append(r)

print(paraphrases)

To evaluate if a paraphrase is a semantic variation to the input query or just a surface level variation & rank the generated paraphrases, use the following model:

https://huggingface.co/salesken/paraphrase_diversity_ranker

Downloads last month
165
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model authors have turned it off explicitly.