--- library_name: transformers tags: [] --- # SUMMARY Just a model using to learn Fine Tuning of 'gpt2-medium' - on a self made datasets - on a self made special tokens - on a multiple fine tuned with ~15K dataset (in progress mode) If interested in how I got to this point and how I created the datasets you can visit: [Crafting GPT2 for Personalized AI-Preparing Data the Long Way](https://medium.com/@deeokay/the-soul-in-the-machine-crafting-gpt2-for-personalized-ai-9d38be3f635f) # FINE TUNED - BASE MODEL I would consider this [GPT2-medium-custom-v1.0](https://huggingface.co/Deeokay/GPT2-medium-custom-v1.0) a the base model to start my Fine Tuning 2.0 on specific Datasets. - Previous models of this: gpt-special-tokens-medium(1~4) are consider beta check-points to this This model is available to test on Ollama [Deeokay/mediumgpt2](https://ollama.com/deeokay/mediumgpt2) it is not perfect and I am still working out some stuff, but I am quite proud that I was able to make it this far. Please note, the acutal GGUF file is also included in this repository if you would like to create your own versions (templates etc.) ## DECLARING NEW SPECIAL TOKENS ```python special_tokens_dict = { 'eos_token': '<|STOP|>', 'bos_token': '<|STOP|>', 'pad_token': '<|PAD|>', 'additional_special_tokens': ['<|BEGIN_QUERY|>', '<|BEGIN_QUERY|>', '<|BEGIN_ANALYSIS|>', '<|END_ANALYSIS|>', '<|BEGIN_RESPONSE|>', '<|END_RESPONSE|>', '<|BEGIN_SENTIMENT|>', '<|END_SENTIMENT|>', '<|BEGIN_CLASSIFICATION|>', '<|END_CLASSIFICATION|>',] } tokenizer.add_special_tokens(special_tokens_dict) model.resize_token_embeddings(len(tokenizer)) tokenizer.eos_token_id = tokenizer.convert_tokens_to_ids('<|STOP|>') tokenizer.bos_token_id = tokenizer.convert_tokens_to_ids('<|STOP|>') tokenizer.pad_token_id = tokenizer.convert_tokens_to_ids('<|PAD|>') ``` The order of tokens is as follows: ```python def combine_text(user_prompt, analysis, sentiment, new_response, classification): user_q = f"<|STOP|><|BEGIN_QUERY|>{user_prompt}<|END_QUERY|>" analysis = f"<|BEGIN_ANALYSIS|>{analysis}<|END_ANALYSIS|>" new_response = f"<|BEGIN_RESPONSE|>{new_response}<|END_RESPONSE|>" classification = f"<|BEGIN_CLASSIFICATION|>{classification}<|END_CLASSIFICATION|>" sentiment = f"<|BEGIN_SENTIMENT|>Sentiment: {sentiment}<|END_SENTIMENT|><|STOP|>" return user_q + analysis + new_response + classification + sentiment ``` ## INFERANCING I am currently testing two ways, if anyone knows a better one, please let me know! ```python import torch from transformers import GPT2LMHeadModel, GPT2Tokenizer models_folder = "Deeokay/gpt2-medium-custom-v1.0" model = GPT2LMHeadModel.from_pretrained(models_folder) tokenizer = GPT2Tokenizer.from_pretrained(models_folder) # Device configuration <> device = torch.device("cpu") model.to(device) ``` ### OPTION 1 INFERFENCE ```python import time class Stopwatch: def __init__(self): self.start_time = None self.end_time = None def start(self): self.start_time = time.time() def stop(self): self.end_time = time.time() def elapsed_time(self): if self.start_time is None: return "Stopwatch hasn't been started" if self.end_time is None: return "Stopwatch hasn't been stopped" return self.end_time - self.start_time stopwatch1 = Stopwatch() def generate_response(input_text, max_length=250): stopwatch1.start() # Prepare the input # input_text = f"<|BEGIN_QUERY|>{input_text}<|END_QUERY|><|BEGIN_ANALYSIS|>{input_text}<|END_ANALYSIS|><|BEGIN_RESPONSE|>" input_text = f"<|BEGIN_QUERY|>{input_text}<|END_QUERY|><|BEGIN_ANALYSIS|>" input_ids = tokenizer.encode(input_text, return_tensors="pt").to(device) # Create attention mask attention_mask = torch.ones_like(input_ids).to(device) # Generate output = model.generate( input_ids, max_new_tokens=max_length, num_return_sequences=1, no_repeat_ngram_size=2, attention_mask=attention_mask, pad_token_id=tokenizer.eos_token_id, eos_token_id=tokenizer.convert_tokens_to_ids('<|STOP|>'), ) stopwatch1.stop() return tokenizer.decode(output[0], skip_special_tokens=False) ``` ### OPTION 2 INFERNCE ```python import time class Stopwatch: def __init__(self): self.start_time = None self.end_time = None def start(self): self.start_time = time.time() def stop(self): self.end_time = time.time() def elapsed_time(self): if self.start_time is None: return "Stopwatch hasn't been started" if self.end_time is None: return "Stopwatch hasn't been stopped" return self.end_time - self.start_time stopwatch2 = Stopwatch() def generate_response2(input_text, max_length=250): stopwatch2.start() # Prepare the input # input_text = f"<|BEGIN_QUERY|>{input_text}<|END_QUERY|><|BEGIN_ANALYSIS|>{input_text}<|END_ANALYSIS|><|BEGIN_RESPONSE|>" input_text = f"<|BEGIN_QUERY|>{input_text}<|END_QUERY|><|BEGIN_ANALYSIS|>" input_ids = tokenizer.encode(input_text, return_tensors="pt").to(device) # Create attention mask attention_mask = torch.ones_like(input_ids).to(device) # # 2ND OPTION FOR : Generate output = model.generate( input_ids, max_new_tokens=max_length, attention_mask=attention_mask, do_sample=True, temperature=0.4, # this can be played around top_k=60, # this can be played around no_repeat_ngram_size=2, pad_token_id=tokenizer.pad_token_id, eos_token_id=tokenizer.eos_token_id, ) stopwatch2.stop() return tokenizer.decode(output[0], skip_special_tokens=False) ``` ### DECODING ANSWER When I need just the response ```python def decode(text): full_text = text # Extract the response part start_token = "<|BEGIN_RESPONSE|>" end_token = "<|END_RESPONSE|>" start_idx = full_text.find(start_token) end_idx = full_text.find(end_token) if start_idx != -1 and end_idx != -1: response = full_text[start_idx + len(start_token):end_idx].strip() else: response = full_text.strip() return response ``` ### MY SETUP I use the stopwatch to time the responses and I use both inference to see the difference ```python input_text = "Who is Steve Jobs and what was contribution?" response1_full = generate_response(input_text) #response1 = decode(response1_full) print(f"Input: {input_text}") print("=======================================") print(f"Response1: {response1_full}") elapsed1 = stopwatch1.elapsed_time() print(f"Process took {elapsed1:.4f} seconds") print("=======================================") response2_full = generate_response2(input_text) #response2 = decode(response2_full) print(f"Response2: {response2_full}") elapsed2 = stopwatch2.elapsed_time() print(f"Process took {elapsed2:.4f} seconds") print("=======================================") ``` ### Out-of-Scope Use Well everything that has a factual data.. trust at your own risk! Never tested on mathamatical knowledge. I quite enjoy how the response feels closer to what I had in mind..