File size: 1,758 Bytes
0e925f9
 
94b99e8
61c1dbf
94b99e8
 
 
 
5806213
 
0e925f9
 
45f9f3a
0e925f9
90f8dbf
0e925f9
45f9f3a
0e925f9
2384138
0e925f9
45f9f3a
7c8d472
91f2e7b
 
 
ea94234
0e925f9
45f9f3a
 
 
0e925f9
45f9f3a
 
0e925f9
45f9f3a
 
 
0e925f9
45f9f3a
 
0e925f9
45f9f3a
 
5806213
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
---
library_name: transformers
license: gpl-3.0
base_model: philippelaban/keep_it_simple
datasets:
- Yelp/yelp_review_full
language:
- en
tags:
- ppo
---

# TAROT-PPO

Task-Oriented Authorship Obfuscation Using Policy Optimization Methods

Fine-tuned text rewriting model with **proximal policy optimization** for authorship obfuscation.

ArXiv paper: https://arxiv.org/abs/2407.21630v1

## Model description
- **Model type:** Authorship obfuscation model using GPT2-based text rewriting
- **Reward models:** [rrivera1849/LUAR-MUD](https://huggingface.co/rrivera1849/LUAR-MUD) & [Alibaba-NLP/gte-large-en-v1.5](https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5)
- **Finetuned from model:** [philippelaban/keep_it_simple](https://huggingface.co/philippelaban/keep_it_simple)
- **Dataset:** [Yelp/yelp_review_full](https://huggingface.co/datasets/Yelp/yelp_review_full)
- **Repository:** https://github.com/hornetsecurity/tarot

## Example use
```python
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("gabrielloiseau/TAROT-PPO")
model = AutoModelForCausalLM.from_pretrained("gabrielloiseau/TAROT-PPO")

paragraph = """I had dinner at Bella's Bistro last night, and it was a delightful experience. 
As soon as I walked in, I was greeted warmly by the hostess, and the cozy, rustic decor made me feel right at home. 
I started with the bruschetta, which was so fresh and flavorful—I could have eaten a whole meal of just that!"""

inputs = tokenizer([paragraph + "<|endoftext|>"], return_tensors="pt", padding=True)
outputs = model.generate(**inputs, do_sample=True, max_new_tokens=128)

outputs = outputs[:, inputs["input_ids"].shape[1]:]
tokenizer.batch_decode(outputs,skip_special_tokens=True)
```