We train a collection of models under RLHF on the above datasets. We use DPO for hh-rlhf and unalignment, and train a PPO on completing IMDB prefixes with positive sentiment.
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.