# Fine-Tuning GPT-2 with RLHF on Drugs.com Reviews for High-Quality Drug Reviews on Depression


**Author**: Zakia Salod

**Affiliation**: University of KwaZulu-Natal (UKZN), Durban, South Africa

**Contact**: zakia.salod@gmail.com

**Machine Used**: Google Colab T4 GPU

**Last Updated**: 10 December 2023

**Description**:
This notebook demonstrates fine-tuning the GPT-2 model (specifically, Zakia/gpt2-drugscom_depression_reviews) using Reinforcement Learning with Human Feedback (RLHF), leveraging the TRL (transformer reinforcement learning) library. The base model (GPT-2) and reward model (DistilBERT, specifically, Zakia/distilbert-drugscom_depression_reviews) are both fine-tuned on the same Drugs.com reviews dataset, focusing on depression. The goal is to further refine the GPT-2 model's ability to generate high-quality patient reviews on depression drugs, using RLHF for targeted improvement. This approach aims to harness the strengths of both GPT-2 and DistilBERT in generating insightful and accurate text content.


**License**:
This work is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0). Free for educational and research use.



<div style="text-align: center">
  <img src='https://huggingface.co/Zakia/gpt2-drugscom_depression_reviews-hq-v1/resolve/main/images/RLHF_DepressionReviews_Flow.png' width='800'>
  <p style="text-align: center;">
    <b>Figure 1:</b> This diagram represents the RLHF process applied to the GPT-2 model (<a href="https://huggingface.co/Zakia/gpt2-drugscom_depression_reviews-hq-v1">link</a>) using the DrugsCom DepressionReviews dataset. The fine-tuned GPT-2 model (<a href="https://huggingface.co/Zakia/gpt2-drugscom_depression_reviews">link</a>) shown in purple, DistilBERT model (<a href="https://huggingface.co/Zakia/distilbert-drugscom_depression_reviews">link</a>) depicted in orange, and the dataset (<a href="https://huggingface.co/datasets/Zakia/drugscom_reviews">link</a>, filtered for 'Depression' condition in the 'train' set) mentioned in the turquoise box, are highlighted to show their integration in the fine-tuning process.</p></div>

## STEP 1: SETTING UP THE ENVIRONMENT

### Load Necessary Libraries

In [None]:
# Enable automatic module reloading to reflect changes in external .py files
%load_ext autoreload
# Reload all modules before executing code, keeping modules up-to-date
%autoreload 2

### Install Required Packages

In [None]:
!pip install Accelerator peft trl wandb

Collecting Accelerator
  Downloading accelerator-2023.11.3.dev1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m6.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting peft
  Downloading peft-0.7.0-py3-none-any.whl (168 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m168.3/168.3 kB[0m [31m24.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting trl
  Downloading trl-0.7.4-py3-none-any.whl (133 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m133.9/133.9 kB[0m [31m19.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting wandb
  Downloading wandb-0.16.1-py3-none-any.whl (2.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m32.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting setproctitle>=1.1.8 (from Accelerator)
  Downloading setproctitle-1.3.3-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.man

### Import Necessary Libraries

In [None]:
from dataclasses import dataclass, field
from typing import Optional
import pandas as pd
import re
import html
import numpy as np
import random

import torch
from accelerate import Accelerator
from datasets import load_dataset
from peft import LoraConfig
from tqdm import tqdm
from transformers import AutoTokenizer, pipeline
from datasets import concatenate_datasets

from trl import AutoModelForCausalLMWithValueHead, AutoModelForSeq2SeqLMWithValueHead, PPOConfig, PPOTrainer, set_seed
from trl.core import LengthSampler
from trl.import_utils import is_xpu_available



In [None]:
tqdm.pandas()

### Set Random Seeds for Reproducibility

In [None]:
seed_value = 42

random.seed(seed_value)
torch.manual_seed(seed_value)

<torch._C.Generator at 0x7dfebd1905b0>

### Initialize Weights & Biases for Tracking

In [None]:
import wandb

wandb.init(project="gpt2-drugscom_depression_reviews-hq-v1")

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


## STEP 2: CONFIGURATION

### Define script arguments for training configuration

In [None]:
@dataclass
class ScriptArguments:
    ppo_config: PPOConfig = field(
        default_factory=lambda: PPOConfig(
            model_name="Zakia/gpt2-drugscom_depression_reviews",
            query_dataset="Zakia/drugscom_reviews",
            reward_model="sentiment-analysis:Zakia/distilbert-drugscom_depression_reviews",
            learning_rate=1.41e-5,
            log_with="wandb",
            mini_batch_size=128,
            batch_size=128,
            gradient_accumulation_steps=1,
            early_stopping=False,
            target_kl=6.0,
            kl_penalty="kl",
            seed=0,
            use_score_scaling=False,
            use_score_norm=False,
            score_clip=None,
        )
    )
    use_seq2seq: bool = False
    """whether to use seq2seq models"""
    use_peft: bool = False
    """whether to use peft"""
    peft_config: Optional[LoraConfig] = field(
        default_factory=lambda: LoraConfig(
            r=16,
            lora_alpha=16,
            bias="none",
            task_type="CAUSAL_LM",
        ),
    )
    trust_remote_code: bool = field(default=False, metadata={"help": "Enable `trust_remote_code`"})

### Initialize script arguments

In [None]:
args = ScriptArguments(
    ppo_config=PPOConfig(
        model_name="Zakia/gpt2-drugscom_depression_reviews",
        query_dataset="Zakia/drugscom_reviews",
        reward_model="sentiment-analysis:Zakia/distilbert-drugscom_depression_reviews",
        learning_rate=1.41e-5,
        log_with="wandb",
        mini_batch_size=128,
        batch_size=128,
        gradient_accumulation_steps=1,
        early_stopping=False,
        target_kl=6.0,
        kl_penalty="kl",
        seed=0,
        use_score_scaling=False,
        use_score_norm=False,
        score_clip=None,
    ),
    use_seq2seq=False,
    use_peft=False,
    peft_config=LoraConfig(
        r=16,
        lora_alpha=16,
        bias="none",
        task_type="CAUSAL_LM",
    ),
    trust_remote_code=False
)

In [None]:
# We then define the arguments to pass to the sentiment analysis pipeline.
# We set `return_all_scores` to True to get the sentiment score for each token.
sent_kwargs = {"return_all_scores": True, "function_to_apply": "none", "batch_size": 16}

# Select appropriate model class based on arguments
trl_model_class = AutoModelForCausalLMWithValueHead if not args.use_seq2seq else AutoModelForSeq2SeqLMWithValueHead

## STEP 3: DATASET PREPARATION

In [None]:
# Function to clean review text
def clean_review(text):
    # Check if the text is a string
    if not isinstance(text, str):
      return ""  # Return an empty string if the input is not a string
    text = html.unescape(text)  # Decode HTML entities
    text = re.sub(r'"', '', text)  # Remove quotes
    text = re.sub(r'<.*?>', '', text)  # Remove HTML tags
    return text

In [None]:
# Clean the reviews of the dataset
# Apply the clean_review function in a batched manner
def clean_reviews(batch):
    # Apply clean_review to each review in the batch and return the modified batch
    return {"review": [clean_review(review) for review in batch["review"]]}

In [None]:
# Function to build and preprocess the dataset
def build_dataset(config, query_dataset, input_min_text_length=2, input_max_text_length=8):
    """
    Build dataset for training. This builds the dataset from `load_dataset`

    Args:
        query_dataset (`str`):
            The name of the dataset to be loaded.

    Returns:
        dataloader (`torch.utils.data.DataLoader`):
            The dataloader for the dataset.
    """
    tokenizer = AutoTokenizer.from_pretrained(config.model_name)
    tokenizer.pad_token = tokenizer.eos_token

    # Load the dataset
    ds = load_dataset(query_dataset, split="train")

    # Filter the dataset for the condition 'Depression'
    ds = ds.filter(lambda x: x["condition"] == "Depression")

    # Filter out (remove) rows with missing drugName, or review
    ds = ds.filter(lambda x: all([x.get("drugName"), x.get("review")]))

    # Clean the reviews
    ds = ds.map(clean_reviews, batched=True)

    # Get the number of records
    num_records = ds.num_rows
    print(f"Number of records with Depression condition: {num_records}")

    input_size = LengthSampler(input_min_text_length, input_max_text_length)

    # Tokenization
    def tokenize(sample):
        sample["input_ids"] = tokenizer.encode(sample["review"])[: input_size()]
        sample["query"] = tokenizer.decode(sample["input_ids"])
        return sample

    ds = ds.map(tokenize, batched=False)

    ds.set_format(type="torch")
    return ds

In [None]:
# Load and preprocess the dataset
# We retrieve the dataloader by calling the `build_dataset` function.
dataset = build_dataset(args.ppo_config, args.ppo_config.query_dataset)

tokenizer_config.json:   0%|          | 0.00/525 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/999k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/470 [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/6.72k [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/84.3M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/28.1M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/2 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Generating test split: 0 examples [00:00, ? examples/s]

Filter:   0%|          | 0/161297 [00:00<?, ? examples/s]

Filter:   0%|          | 0/9069 [00:00<?, ? examples/s]

Map:   0%|          | 0/9069 [00:00<?, ? examples/s]

Number of records with Depression condition: 9069


Map:   0%|          | 0/9069 [00:00<?, ? examples/s]

## STEP 4: MODEL INITIALIZATION

In [None]:
def collator(data):
    return dict((key, [d[key] for d in data]) for key in data[0])

In [None]:
# Set seed before initializing value head for deterministic eval
set_seed(args.ppo_config.seed)

In [None]:
# Now let's build the model, the reference model, and the tokenizer.
if not args.use_peft:
    ref_model = trl_model_class.from_pretrained(args.ppo_config.model_name, trust_remote_code=args.trust_remote_code)
    device_map = None
    peft_config = None
else:
    peft_config = args.peft_config
    ref_model = None
    # Copy the model to each device
    device_map = {"": Accelerator().local_process_index}

config.json:   0%|          | 0.00/942 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/498M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]



In [None]:
model = trl_model_class.from_pretrained(
    args.ppo_config.model_name,
    trust_remote_code=args.trust_remote_code,
    device_map=device_map,
    peft_config=peft_config,
)



In [None]:
tokenizer = AutoTokenizer.from_pretrained(args.ppo_config.model_name)

In [None]:
# Some tokenizers like GPT-2's don't have a padding token by default, so we set one here.
tokenizer.pad_token_id = tokenizer.eos_token_id

## STEP 5: INITIALIZE PPO TRAINER

In [None]:
# We then build the PPOTrainer, passing the model, the reference model, the tokenizer
ppo_trainer = PPOTrainer(args.ppo_config, model, ref_model, tokenizer, dataset=dataset, data_collator=collator)

VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

In [None]:
# We then build the sentiment analysis pipeline, passing the model name and the
# sentiment analysis pipeline arguments. Let's also make sure to set the device
# to the same device as the PPOTrainer.
device = ppo_trainer.accelerator.device
if ppo_trainer.accelerator.num_processes == 1:
    if is_xpu_available():
        device = "xpu:0"
    else:
        device = 0 if torch.cuda.is_available() else "cpu"  # to avoid a `pipeline` bug
ds_plugin = ppo_trainer.accelerator.state.deepspeed_plugin
task, model_name = args.ppo_config.reward_model.split(":")
if ds_plugin is not None and ds_plugin.is_zero3_init_enabled():
    with ds_plugin.zero3_init_context_manager(enable=False):
        sentiment_pipe = pipeline(task, model=model_name, device=device)
else:
    sentiment_pipe = pipeline(task, model=model_name, device=device)

config.json:   0%|          | 0.00/781 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.25k [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

In [None]:
# Some tokenizers like GPT-2's don't have a padding token by default, so we set one here.
if sentiment_pipe.tokenizer.pad_token_id is None:
    sentiment_pipe.tokenizer.pad_token_id = tokenizer.pad_token_id

if sentiment_pipe.model.config.pad_token_id is None:
    sentiment_pipe.model.config.pad_token_id = tokenizer.pad_token_id

In [None]:
text = "This medication has changed my life for the better. I've experienced no side effects and my symptoms of depression have significantly decreased."
sentiment_pipe(text, **sent_kwargs)



[[{'label': 'LOW_QUALITY_REVIEW', 'score': -2.689751625061035},
  {'label': 'HIGH_QUALITY_REVIEW', 'score': 2.4064526557922363}]]

In [None]:
text = "I've had a terrible experience with this medication. It made me feel nauseous and I didn't notice any improvement in my condition."
sentiment_pipe(text, **sent_kwargs)

[[{'label': 'LOW_QUALITY_REVIEW', 'score': 3.4417612552642822},
  {'label': 'HIGH_QUALITY_REVIEW', 'score': -3.960636615753174}]]

### Generation Settings
For the response generation we just use sampling and make sure top-k and nucleus sampling are turned off as well as a minimal length.

In [None]:
# We then define the arguments to pass to the `generate` function. These arguments
# are passed to the `generate` function of the PPOTrainer, which is a wrapper around
# the `generate` function of the trained model.
generation_kwargs = {
    "min_length": -1,
    "top_k": 0.0,
    "top_p": 1.0,
    "do_sample": True,
    "pad_token_id": tokenizer.eos_token_id,
    "max_new_tokens": 32,
}

## STEP 6: TRAINING LOOP FOR FINE-TUNING WITH REINFORCEMENT LEARNING WITH HUMAN FEEDBACK (RLHF)

In [None]:
# Training loop for RLHF
for epoch, batch in tqdm(enumerate(ppo_trainer.dataloader)):
    query_tensors = batch["input_ids"]

    # Get response from gpt2
    response_tensors, ref_response_tensors = ppo_trainer.generate(
        query_tensors, return_prompt=False, generate_ref_response=True, **generation_kwargs
    )
    batch["response"] = tokenizer.batch_decode(response_tensors)
    batch["ref_response"] = tokenizer.batch_decode(ref_response_tensors)

    # Compute sentiment score
    texts = [q + r for q, r in zip(batch["query"], batch["response"])]
    pipe_outputs = sentiment_pipe(texts, **sent_kwargs)
    rewards = [torch.tensor(output[1]["score"]) for output in pipe_outputs]
    ref_texts = [q + r for q, r in zip(batch["query"], batch["ref_response"])]
    ref_pipe_outputs = sentiment_pipe(ref_texts, **sent_kwargs)
    ref_rewards = [torch.tensor(output[1]["score"]) for output in ref_pipe_outputs]
    batch["ref_rewards"] = ref_rewards

    # Run PPO step
    stats = ppo_trainer.step(query_tensors, response_tensors, rewards)
    ppo_trainer.log_stats(stats, batch, rewards, columns_to_log=["query", "response", "ref_response", "ref_rewards"])

0it [00:00, ?it/s]You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
70it [46:05, 39.51s/it]


## TRAINING PROGRESS WITH WEIGHTS & BIASES

This section of the notebook is dedicated to visualizing and interpreting the training progress using the Weights & Biases platform. By integrating with Weights & Biases, we gain access to valuable visual insights that help us understand how the model's performance evolves over time. Refer to the interactive report on wandb.ai at: [GPT-2 RLHF DepressionReviews Report](https://wandb.ai/team-zakia/trl/reports/GPT-2-RLHF-DepressionReviews-Report--Vmlldzo2MjMxNjk5?accessToken=2bjrzj5pa1rspudwgw38hgv0rqamtpfgd30feha7vkzht01hkrytet9vy82x2rvk). An overview of the Reward Mean Plot and the Reward Distribution Heatmap Plot from this report can be found below, with the plots in Figures 2 and 3, respectively.

### Reward Mean Plot

<div style="text-align: center">
<img src='https://huggingface.co/Zakia/gpt2-drugscom_depression_reviews-hq-v1/resolve/main/images/reward_mean_plot.png' width='800'>
<p style="text-align: center;"> <b>Figure 2:</b> env/reward_mean plot showing the average reward per training step. </p>
</div>

This plot represents the average reward that the model received at each step of the training process. A reward in this context is a numerical value that indicates how well the generated text aligns with high-quality standards as determined by the reward model. The upward trend in the plot suggests that the model consistently starts to generate more positive and aligned outputs as training progresses, indicating that the RLHF technique is effectively improving the model's performance.

### Reward Distribution Heatmap Plot

<div style="text-align: center">
<img src='https://huggingface.co/Zakia/gpt2-drugscom_depression_reviews-hq-v1/resolve/main/images/reward_distribution_heatmap.png' width='800'>
<p style="text-align: center;"> <b>Figure 3:</b> env/reward_dist heatmap plot showing the distribution of rewards over training steps. </p>
</div>

This heatmap illustrates the distribution of rewards over training steps. Each vertical slice of the plot can be thought of as a snapshot of the reward landscape at a given step, with the color intensity representing the frequency of rewards at different levels. As the training continues, we expect to see the color bands shift upwards, which would indicate that the model is more frequently generating higher-quality responses.

## STEP 7: EVALUATE THE MODEL

In [None]:
output_min_length = 10
output_max_length = 50
output_length_sampler = LengthSampler(output_min_length, output_max_length)

#### Get a batch from the dataset
bs = 20
game_data = dict()
dataset.set_format("pandas")
df_batch = dataset[:].sample(bs)
game_data["query"] = df_batch["query"].tolist()
query_tensors = df_batch["input_ids"].tolist()

response_tensors_ref, response_tensors = [], []

#### Get response from GPT2 and GPT2_REF
for i in range(bs):
    gen_len = output_length_sampler()

    # Update generation_kwargs with the dynamic max_new_tokens value
    dynamic_generation_kwargs = generation_kwargs.copy()
    dynamic_generation_kwargs['max_new_tokens'] = gen_len

    output = ref_model.generate(
        torch.tensor(query_tensors[i]).unsqueeze(dim=0).to(device), **dynamic_generation_kwargs
    ).squeeze()[-gen_len:]
    response_tensors_ref.append(output)

    output = model.generate(
        torch.tensor(query_tensors[i]).unsqueeze(dim=0).to(device), **dynamic_generation_kwargs
    ).squeeze()[-gen_len:]
    response_tensors.append(output)

#### Decode responses
game_data["response (before)"] = [tokenizer.decode(response_tensors_ref[i]) for i in range(bs)]
game_data["response (after)"] = [tokenizer.decode(response_tensors[i]) for i in range(bs)]

#### Sentiment analysis of query/response pairs before/after
texts = [q + r for q, r in zip(game_data["query"], game_data["response (before)"])]
game_data["rewards (before)"] = [output[1]["score"] for output in sentiment_pipe(texts, **sent_kwargs)]

texts = [q + r for q, r in zip(game_data["query"], game_data["response (after)"])]
game_data["rewards (after)"] = [output[1]["score"] for output in sentiment_pipe(texts, **sent_kwargs)]

# Store results in a dataframe
df_results = pd.DataFrame(game_data)
df_results



Unnamed: 0,query,response (before),response (after),rewards (before),rewards (after)
0,Very Very good. Helps,Very Very good. Helps to deal with some of the...,me with extreme depression and anxiety. Can h...,-1.91144,2.428029
1,It worked for about,a month. The nausea is gone and I no longer f...,6 months and I feel so much better that I've ...,1.584692,2.306569
2,Started on 20,mg. It seems to help somewhat with some of t...,mg for 4 days now...I feel great. I've been si...,-3.230841,2.124902
3,I am a 43,year old woman with severe anxiety and depres...,year old mother of two and a mother of two da...,0.613143,2.23794
4,I got on Pro,"zac. I would sleep all night, feel sick one d...",I got on Prozac for about two years. Prozac re...,-3.422938,2.157211
5,This drug has changed me,drastically. A year after taking an XL I bec...,from an excited procrastinator! I feel like I...,-3.109789,0.99171
6,Good Med! Little skeptical,"at first, because some things that seem so go...",Good Med! Little skeptical & SEXy!<|endoftext|>,-3.691891,-1.087502
7,I used to take cl,onazepam for 4 years and that caused much worse,onazepam and 5mg prozac at the same,-1.09592,1.416539
8,Been on Prozac,for 11 years now and still getting worse as t...,for 13 years. I have never felt this good in ...,-1.745274,2.08382
9,I've been on,Pristiq for 1 week now. The first 1 day was h...,"I've been on this for over three months, I fee...",-0.377317,2.04174


Looking at the reward mean/median of the generated sequences we observe a significant difference.

In [None]:
print("mean:")
display(df_results[["rewards (before)", "rewards (after)"]].mean())
print()
print("median:")
display(df_results[["rewards (before)", "rewards (after)"]].median())

mean:


rewards (before)   -1.622197
rewards (after)     1.415619
dtype: float64


median:


rewards (before)   -1.828357
rewards (after)     2.062780
dtype: float64

## STEP 8: SAVE THE FINE-TUNED GPT-2 MODEL: gpt2-drugscom_depression_reviews-hq-v1

In [None]:
from huggingface_hub import notebook_login # To log to our Hugging Face account to be able to upload models to the Hub.

In [None]:
# Login to Hugging Face within the notebook
notebook_login()
!git config --global credential.helper store

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

NotImplementedError: ignored

In [None]:
# Save and push your model to the hub
model.save_pretrained("gpt2-drugscom_depression_reviews-hq-v1", push_to_hub=True)
tokenizer.save_pretrained("gpt2-drugscom_depression_reviews-hq-v1", push_to_hub=True)

model.safetensors:   0%|          | 0.00/498M [00:00<?, ?B/s]

('gpt2-drugscom_depression_reviews-hq-v1/tokenizer_config.json',
 'gpt2-drugscom_depression_reviews-hq-v1/special_tokens_map.json',
 'gpt2-drugscom_depression_reviews-hq-v1/vocab.json',
 'gpt2-drugscom_depression_reviews-hq-v1/merges.txt',
 'gpt2-drugscom_depression_reviews-hq-v1/added_tokens.json',
 'gpt2-drugscom_depression_reviews-hq-v1/tokenizer.json')