metadata
license: mit
datasets: pt-sk/imdb
tags:
- PPO
- RLHF
GPT2-IMDB is pretrained on IMDB dataset. Aligning the model using Proximal Policy Optimization (PPO). The goal is to train the model to generate positive sentiment reviews. The training process utilizes the trl
library for reinforcement learning, the transformers
library for model handling, and datasets
for dataset management.
Implementation code is available here: GitHub
# Load model and tokenizer directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("pt-sk/GPT2-IMDB-Sentiment-FineTuning-with-PPO")
model = AutoModelForCausalLM.from_pretrained("pt-sk/GPT2-IMDB-Sentiment-FineTuning-with-PPO")
# Example usage
input_text = "The movie was fantastic"
inputs = tokenizer(input_text, return_tensors='pt')
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))