{ "cells": [ { "cell_type": "markdown", "source": [ "# Fine-Tuning GPT-2 with RLHF on Drugs.com Reviews for High-Quality Drug Reviews on Depression\n", "\n", "\n", "**Author**: Zakia Salod\n", "\n", "**Affiliation**: University of KwaZulu-Natal (UKZN), Durban, South Africa\n", "\n", "**Contact**: zakia.salod@gmail.com\n", "\n", "**Machine Used**: Google Colab T4 GPU\n", "\n", "**Last Updated**: 10 December 2023\n", "\n", "**Description**:\n", "This notebook demonstrates fine-tuning the GPT-2 model (specifically, Zakia/gpt2-drugscom_depression_reviews) using Reinforcement Learning with Human Feedback (RLHF), leveraging the TRL (transformer reinforcement learning) library. The base model (GPT-2) and reward model (DistilBERT, specifically, Zakia/distilbert-drugscom_depression_reviews) are both fine-tuned on the same Drugs.com reviews dataset, focusing on depression. The goal is to further refine the GPT-2 model's ability to generate high-quality patient reviews on depression drugs, using RLHF for targeted improvement. This approach aims to harness the strengths of both GPT-2 and DistilBERT in generating insightful and accurate text content.\n", "\n", "\n", "**License**:\n", "This work is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0). Free for educational and research use.\n", "\n" ], "metadata": { "id": "McyCLRHCRADe" } }, { "cell_type": "markdown", "source": [ "
\n", " Figure 1: This diagram represents the RLHF process applied to the GPT-2 model (link) using the DrugsCom DepressionReviews dataset. The fine-tuned GPT-2 model (link) shown in purple, DistilBERT model (link) depicted in orange, and the dataset (link, filtered for 'Depression' condition in the 'train' set) mentioned in the turquoise box, are highlighted to show their integration in the fine-tuning process.
/content/wandb/run-20231210_103320-e2k30lm5
"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"./wandb/run-20231210_103320-e2k30lm5/logs
"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"/content/wandb/run-20231210_103402-ku689w1y
"
]
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"Figure 2: env/reward_mean plot showing the average reward per training step.
\n", "Figure 3: env/reward_dist heatmap plot showing the distribution of rewards over training steps.
\n", "\n", " | query | \n", "response (before) | \n", "response (after) | \n", "rewards (before) | \n", "rewards (after) | \n", "
---|---|---|---|---|---|
0 | \n", "Very Very good. Helps | \n", "Very Very good. Helps to deal with some of the... | \n", "me with extreme depression and anxiety. Can h... | \n", "-1.911440 | \n", "2.428029 | \n", "
1 | \n", "It worked for about | \n", "a month. The nausea is gone and I no longer f... | \n", "6 months and I feel so much better that I've ... | \n", "1.584692 | \n", "2.306569 | \n", "
2 | \n", "Started on 20 | \n", "mg. It seems to help somewhat with some of t... | \n", "mg for 4 days now...I feel great. I've been si... | \n", "-3.230841 | \n", "2.124902 | \n", "
3 | \n", "I am a 43 | \n", "year old woman with severe anxiety and depres... | \n", "year old mother of two and a mother of two da... | \n", "0.613143 | \n", "2.237940 | \n", "
4 | \n", "I got on Pro | \n", "zac. I would sleep all night, feel sick one d... | \n", "I got on Prozac for about two years. Prozac re... | \n", "-3.422938 | \n", "2.157211 | \n", "
5 | \n", "This drug has changed me | \n", "drastically. A year after taking an XL I bec... | \n", "from an excited procrastinator! I feel like I... | \n", "-3.109789 | \n", "0.991710 | \n", "
6 | \n", "Good Med! Little skeptical | \n", "at first, because some things that seem so go... | \n", "Good Med! Little skeptical & SEXy!<|endoftext|> | \n", "-3.691891 | \n", "-1.087502 | \n", "
7 | \n", "I used to take cl | \n", "onazepam for 4 years and that caused much worse | \n", "onazepam and 5mg prozac at the same | \n", "-1.095920 | \n", "1.416539 | \n", "
8 | \n", "Been on Prozac | \n", "for 11 years now and still getting worse as t... | \n", "for 13 years. I have never felt this good in ... | \n", "-1.745274 | \n", "2.083820 | \n", "
9 | \n", "I've been on | \n", "Pristiq for 1 week now. The first 1 day was h... | \n", "I've been on this for over three months, I fee... | \n", "-0.377317 | \n", "2.041740 | \n", "
10 | \n", "I've been taking this pill | \n", "for 6 months and I feel 100% better after just | \n", "for almost 14 years and I took I F 3 times | \n", "1.914182 | \n", "2.104877 | \n", "
11 | \n", "Was put on 30mg. | \n", "Still depressed the first 3 months. Now I'm o... | \n", "It made me feel much better! I could finally ... | \n", "-3.148672 | \n", "2.363128 | \n", "
12 | \n", "I was addicted to | \n", "opiates for years when I was younger and I co... | \n", "SSRI's, hallucinogens, and drug- just therapy... | \n", "-3.175338 | \n", "0.608868 | \n", "
13 | \n", "I've been on approximately | \n", "10 mg for six months and it's taking away som... | \n", "two dozen different SSRI's over the past thir... | \n", "-3.352825 | \n", "2.426083 | \n", "
14 | \n", "Super good for depression. | \n", "Super good for depression. Diarrhea is better... | \n", "Super good for depression. I am 5'1 and I have... | \n", "1.200642 | \n", "1.900504 | \n", "
15 | \n", "Hyped me up like crazy | \n", ", as did my MGs who said they thought this was... | \n", "before and I'm a great mom now! I work as an ... | \n", "-4.075667 | \n", "1.796120 | \n", "
16 | \n", "I am a 32 | \n", "Y male and have been on up to 300 mg of zoloft... | \n", "yr old male and live in a very nice suburb (wi... | \n", "0.437383 | \n", "2.238822 | \n", "
17 | \n", "My only complaint is that I | \n", "complaint is that I experience extreme foggy ... | \n", "feel fatigued during the day. And also depres... | \n", "-4.083458 | \n", "-3.843148 | \n", "
18 | \n", "I've been on | \n", "this 300mg for just a little over a year and ... | \n", "I've been on this anti-depressant for over 18 ... | \n", "-0.072659 | \n", "0.724089 | \n", "
19 | \n", "Lexapro has been | \n", "better than anything I've tried to battle my ... | \n", "Lexapro has been AMAZING, so far. I've been on... | \n", "-1.699961 | \n", "1.292072 | \n", "