Deeokay's picture
Update README.md
2699d4a verified
|
raw
history blame
7.38 kB
---
library_name: transformers
tags: []
---
# SUMMARY
Just a model using to learn Fine Tuning of 'gpt2-medium'
- on a self made datasets
- on a self made special tokens
- on a multiple fine tuned with ~15K dataset (in progress mode)
If interested in how I got to this point and how I created the datasets you can visit:
[Crafting GPT2 for Personalized AI-Preparing Data the Long Way](https://medium.com/@deeokay/the-soul-in-the-machine-crafting-gpt2-for-personalized-ai-9d38be3f635f)
<!-- Provide a quick summary of what the model is/does. -->
# FINE TUNED - BASE MODEL
I would consider this [GPT2-medium-custom-v1.0](https://huggingface.co/Deeokay/GPT2-medium-custom-v1.0) a the base model to start my Fine Tuning 2.0 on specific Datasets.
- Previous models of this: gpt-special-tokens-medium(1~4) are consider beta check-points to this
This model is available to test on Ollama [Deeokay/mediumgpt2](https://ollama.com/deeokay/mediumgpt2) it is not perfect and I am still working out some stuff, but I am quite proud that I was able to make it this far.
Please note, the acutal GGUF file is also included in this repository if you would like to create your own versions (templates etc.)
## DECLARING NEW SPECIAL TOKENS
```python
special_tokens_dict = {
'eos_token': '<|STOP|>',
'bos_token': '<|STOP|>',
'pad_token': '<|PAD|>',
'additional_special_tokens': ['<|BEGIN_QUERY|>', '<|BEGIN_QUERY|>',
'<|BEGIN_ANALYSIS|>', '<|END_ANALYSIS|>',
'<|BEGIN_RESPONSE|>', '<|END_RESPONSE|>',
'<|BEGIN_SENTIMENT|>', '<|END_SENTIMENT|>',
'<|BEGIN_CLASSIFICATION|>', '<|END_CLASSIFICATION|>',]
}
tokenizer.add_special_tokens(special_tokens_dict)
model.resize_token_embeddings(len(tokenizer))
tokenizer.eos_token_id = tokenizer.convert_tokens_to_ids('<|STOP|>')
tokenizer.bos_token_id = tokenizer.convert_tokens_to_ids('<|STOP|>')
tokenizer.pad_token_id = tokenizer.convert_tokens_to_ids('<|PAD|>')
```
The order of tokens is as follows:
```python
def combine_text(user_prompt, analysis, sentiment, new_response, classification):
user_q = f"<|STOP|><|BEGIN_QUERY|>{user_prompt}<|END_QUERY|>"
analysis = f"<|BEGIN_ANALYSIS|>{analysis}<|END_ANALYSIS|>"
new_response = f"<|BEGIN_RESPONSE|>{new_response}<|END_RESPONSE|>"
classification = f"<|BEGIN_CLASSIFICATION|>{classification}<|END_CLASSIFICATION|>"
sentiment = f"<|BEGIN_SENTIMENT|>Sentiment: {sentiment}<|END_SENTIMENT|><|STOP|>"
return user_q + analysis + new_response + classification + sentiment
```
## INFERANCING
I am currently testing two ways, if anyone knows a better one, please let me know!
```python
import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer
models_folder = "Deeokay/gpt2-medium-custom-v1.0"
model = GPT2LMHeadModel.from_pretrained(models_folder)
tokenizer = GPT2Tokenizer.from_pretrained(models_folder)
# Device configuration <<change as needed>>
device = torch.device("cpu")
model.to(device)
```
### OPTION 1 INFERFENCE
```python
import time
class Stopwatch:
def __init__(self):
self.start_time = None
self.end_time = None
def start(self):
self.start_time = time.time()
def stop(self):
self.end_time = time.time()
def elapsed_time(self):
if self.start_time is None:
return "Stopwatch hasn't been started"
if self.end_time is None:
return "Stopwatch hasn't been stopped"
return self.end_time - self.start_time
stopwatch1 = Stopwatch()
def generate_response(input_text, max_length=250):
stopwatch1.start()
# Prepare the input
# input_text = f"<|BEGIN_QUERY|>{input_text}<|END_QUERY|><|BEGIN_ANALYSIS|>{input_text}<|END_ANALYSIS|><|BEGIN_RESPONSE|>"
input_text = f"<|BEGIN_QUERY|>{input_text}<|END_QUERY|><|BEGIN_ANALYSIS|>"
input_ids = tokenizer.encode(input_text, return_tensors="pt").to(device)
# Create attention mask
attention_mask = torch.ones_like(input_ids).to(device)
# Generate
output = model.generate(
input_ids,
max_new_tokens=max_length,
num_return_sequences=1,
no_repeat_ngram_size=2,
attention_mask=attention_mask,
pad_token_id=tokenizer.eos_token_id,
eos_token_id=tokenizer.convert_tokens_to_ids('<|STOP|>'),
)
stopwatch1.stop()
return tokenizer.decode(output[0], skip_special_tokens=False)
```
### OPTION 2 INFERNCE
```python
import time
class Stopwatch:
def __init__(self):
self.start_time = None
self.end_time = None
def start(self):
self.start_time = time.time()
def stop(self):
self.end_time = time.time()
def elapsed_time(self):
if self.start_time is None:
return "Stopwatch hasn't been started"
if self.end_time is None:
return "Stopwatch hasn't been stopped"
return self.end_time - self.start_time
stopwatch2 = Stopwatch()
def generate_response2(input_text, max_length=250):
stopwatch2.start()
# Prepare the input
# input_text = f"<|BEGIN_QUERY|>{input_text}<|END_QUERY|><|BEGIN_ANALYSIS|>{input_text}<|END_ANALYSIS|><|BEGIN_RESPONSE|>"
input_text = f"<|BEGIN_QUERY|>{input_text}<|END_QUERY|><|BEGIN_ANALYSIS|>"
input_ids = tokenizer.encode(input_text, return_tensors="pt").to(device)
# Create attention mask
attention_mask = torch.ones_like(input_ids).to(device)
# # 2ND OPTION FOR : Generate
output = model.generate(
input_ids,
max_new_tokens=max_length,
attention_mask=attention_mask,
do_sample=True,
temperature=0.4, # this can be played around
top_k=60, # this can be played around
no_repeat_ngram_size=2,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
)
stopwatch2.stop()
return tokenizer.decode(output[0], skip_special_tokens=False)
```
### DECODING ANSWER
When I need just the response
```python
def decode(text):
full_text = text
# Extract the response part
start_token = "<|BEGIN_RESPONSE|>"
end_token = "<|END_RESPONSE|>"
start_idx = full_text.find(start_token)
end_idx = full_text.find(end_token)
if start_idx != -1 and end_idx != -1:
response = full_text[start_idx + len(start_token):end_idx].strip()
else:
response = full_text.strip()
return response
```
### MY SETUP
I use the stopwatch to time the responses and I use both inference to see the difference
```python
input_text = "Who is Steve Jobs and what was contribution?"
response1_full = generate_response(input_text)
#response1 = decode(response1_full)
print(f"Input: {input_text}")
print("=======================================")
print(f"Response1: {response1_full}")
elapsed1 = stopwatch1.elapsed_time()
print(f"Process took {elapsed1:.4f} seconds")
print("=======================================")
response2_full = generate_response2(input_text)
#response2 = decode(response2_full)
print(f"Response2: {response2_full}")
elapsed2 = stopwatch2.elapsed_time()
print(f"Process took {elapsed2:.4f} seconds")
print("=======================================")
```
### Out-of-Scope Use
Well everything that has a factual data.. trust at your own risk!
Never tested on mathamatical knowledge.
I quite enjoy how the response feels closer to what I had in mind..