File size: 1,758 Bytes
0e925f9 94b99e8 61c1dbf 94b99e8 5806213 0e925f9 45f9f3a 0e925f9 90f8dbf 0e925f9 45f9f3a 0e925f9 2384138 0e925f9 45f9f3a 7c8d472 91f2e7b ea94234 0e925f9 45f9f3a 0e925f9 45f9f3a 0e925f9 45f9f3a 0e925f9 45f9f3a 0e925f9 45f9f3a 5806213 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
---
library_name: transformers
license: gpl-3.0
base_model: philippelaban/keep_it_simple
datasets:
- Yelp/yelp_review_full
language:
- en
tags:
- ppo
---
# TAROT-PPO
Task-Oriented Authorship Obfuscation Using Policy Optimization Methods
Fine-tuned text rewriting model with **proximal policy optimization** for authorship obfuscation.
ArXiv paper: https://arxiv.org/abs/2407.21630v1
## Model description
- **Model type:** Authorship obfuscation model using GPT2-based text rewriting
- **Reward models:** [rrivera1849/LUAR-MUD](https://huggingface.co/rrivera1849/LUAR-MUD) & [Alibaba-NLP/gte-large-en-v1.5](https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5)
- **Finetuned from model:** [philippelaban/keep_it_simple](https://huggingface.co/philippelaban/keep_it_simple)
- **Dataset:** [Yelp/yelp_review_full](https://huggingface.co/datasets/Yelp/yelp_review_full)
- **Repository:** https://github.com/hornetsecurity/tarot
## Example use
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("gabrielloiseau/TAROT-PPO")
model = AutoModelForCausalLM.from_pretrained("gabrielloiseau/TAROT-PPO")
paragraph = """I had dinner at Bella's Bistro last night, and it was a delightful experience.
As soon as I walked in, I was greeted warmly by the hostess, and the cozy, rustic decor made me feel right at home.
I started with the bruschetta, which was so fresh and flavorful—I could have eaten a whole meal of just that!"""
inputs = tokenizer([paragraph + "<|endoftext|>"], return_tensors="pt", padding=True)
outputs = model.generate(**inputs, do_sample=True, max_new_tokens=128)
outputs = outputs[:, inputs["input_ids"].shape[1]:]
tokenizer.batch_decode(outputs,skip_special_tokens=True)
``` |