File size: 4,469 Bytes
ff23581
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4d26d05
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
93213d2
4d26d05
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
93213d2
4d26d05
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
---
base_model: unsloth/mistral-7b-v0.3-bnb-4bit
language:
- en
license: apache-2.0
tags:
- text-generation-inference
- transformers
- unsloth
- mistral
- gguf
---

# Uploaded  model

- **Developed by:** Deeokay
- **License:** apache-2.0
- **Finetuned from model :** unsloth/mistral-7b-v0.3-bnb-4bit

This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)


# README

This is a test model on a the following 
- a private dataset
- slight customization on alpaca chat template
- Works with Ollama create but requires customization to Modelfile
- One reason for this was wanted to try doing Q2_K and see if it was actually good(?) -> Exceeds Expectation!!
- My examples will be based on unslot.Q2_K.GGUF file, however other quantization should work as well

# HOW TO USE

The whole point of conversion for me was I wanted to be able to to use it through Ollama or (other local options)
For Ollama, it required to be a GGUF file. Once you have this it is pretty straight forward 

If you want to try it first, the Q2_K version of this is available in Ollama => deeokay/minimistral

```python
ollama pull deeokay/minimistral
```


# Quick Start: 
- You must already have Ollama running in your setting
- Download the unsloth.Q2_K.gguf model from Files
- In the same directory create a file call "Modelfile"
- Inside the "Modelfile" type

```python
FROM ./mistrial_unsloth.Q2_K.gguf

PARAMETER stop <|STOP|>
PARAMETER stop "<|STOP|>"
PARAMETER stop <|END_RESPONSE|>
PARAMETER stop "<|END_RESPONSE|>"
PARAMETER temperature 0.4

TEMPLATE """<|BEGIN_QUERY|>
{{.Prompt}}
<|END_QUERY|>
<|BEGIN_RESPONSE|>
"""

SYSTEM """You are an AI assistant. Respond to the user's query between the BEGIN_QUERY and END_QUERY tokens. Use the appropriate BEGIN_ and END_ tokens for different types of content in your response.""""""
```
- Save a go back to the folder (folder where model + Modelfile exisit)
- Now in terminal make sure you are in the same location of the folder and type in the following command

```python
ollama create mycustomai  # "mycustomai" <- you can name it anything u want
```

After than you should be able to use this model to chat! 
This GGUF is based on unsloth/mistral-7b-instruct-v0.3-bnb-4bit by Unslot, 


# NOTE: DISCLAIMER

Please note this is not for the purpose of production, but result of Fine Tuning through self learning
This is my Fine Tuning pass through with personalized customized dataset. 
Please feel free to customize the Modelfile, and if you do get a better response than mine, please share!!

If would like to know how I started creating my dataset, you can check this link 
[Crafting GPT2 for Personalized AI-Preparing Data the Long Way (Part1)](https://medium.com/@deeokay/the-soul-in-the-machine-crafting-gpt2-for-personalized-ai-9d38be3f635f)

## The training data has the following Template:

```python
special_tokens_dict = {
    'eos_token': '<|STOP|>',
    'bos_token': '<|STOP|>',
    'pad_token': '<|PAD|>',
    'additional_special_tokens': ['<|BEGIN_QUERY|>', '<|BEGIN_QUERY|>',
                                  '<|BEGIN_ANALYSIS|>', '<|END_ANALYSIS|>',
                                  '<|BEGIN_RESPONSE|>', '<|END_RESPONSE|>',
                                  '<|BEGIN_SENTIMENT|>', '<|END_SENTIMENT|>',
                                  '<|BEGIN_CLASSIFICATION|>', '<|END_CLASSIFICATION|>',]
}

tokenizer.add_special_tokens(special_tokens_dict)
model.resize_token_embeddings(len(tokenizer))

tokenizer.eos_token_id = tokenizer.convert_tokens_to_ids('<|STOP|>')
tokenizer.bos_token_id = tokenizer.convert_tokens_to_ids('<|STOP|>')
tokenizer.pad_token_id = tokenizer.convert_tokens_to_ids('<|PAD|>')

```

## The data is in the following format: 

```python
def combine_text(user_prompt, analysis, sentiment, new_response, classification):    
    user_q = f"<|STOP|><|BEGIN_QUERY|>{user_prompt}<|END_QUERY|>"
    analysis = f"<|BEGIN_ANALYSIS|>{analysis}<|END_ANALYSIS|>"
    new_response = f"<|BEGIN_RESPONSE|>{new_response}<|END_RESPONSE|>"
    classification = f"<|BEGIN_CLASSIFICATION|>{classification}<|END_CLASSIFICATION|>"
    sentiment = f"<|BEGIN_SENTIMENT|>Sentiment: {sentiment}<|END_SENTIMENT|><|STOP|>"
    return user_q + analysis + new_response + classification + sentiment
```