File size: 15,613 Bytes
073e9f2
a024b73
76ec7de
 
 
 
b16e16c
 
76ec7de
b16e16c
 
 
 
 
073e9f2
 
6ad5274
073e9f2
d601514
 
073e9f2
ccaf568
b3a8857
 
 
6f2c7ae
6a993fb
 
 
6ad5274
 
 
 
 
6a993fb
 
d601514
 
 
ccaf568
 
76ec7de
d601514
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ccaf568
 
 
 
 
 
 
 
6ad5274
ccaf568
 
 
 
 
6ad5274
ccaf568
 
 
d601514
ccaf568
 
d601514
ccaf568
 
 
 
 
 
 
 
 
 
 
bc28796
 
ccaf568
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bc28796
 
 
ccaf568
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bc28796
 
ccaf568
d601514
ccaf568
d601514
ccaf568
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bc28796
 
ccaf568
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6ad5274
ccaf568
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d601514
ccaf568
 
 
 
 
 
ff5d06a
 
073e9f2
ff5d06a
073e9f2
ff5d06a
 
 
 
 
 
 
 
073e9f2
ff5d06a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
073e9f2
76ec7de
073e9f2
c691634
073e9f2
c691634
 
d758f42
 
a33e92b
c691634
073e9f2
76ec7de
073e9f2
c691634
 
 
 
 
 
 
 
 
76ec7de
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
---
base_model: google/gemma-2-9b-it
datasets:
- DiTy/function-calling
language:
- en
library_name: transformers
license: apache-2.0
pipeline_tag: text-generation
tags:
- conversational
- gemma2
- function-calling
- trl
---

# DiTy/gemma-2-9b-it-function-calling-GGUF

This model is a fine-tuned version of [google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it) for the **Function Calling** task on non-synthetic data, 
fully annotated by humans only, on the English version of the <ins>*DiTy/function-calling*</ins> dataset.
<!-- Provide a quick summary of what the model is/does. -->

> [!NOTE]
> NB: This model has a fairly high quality, but you might want to try a big guy [DiTy/gemma-2-27b-it-function-calling-GGUF](https://huggingface.co/DiTy/gemma-2-27b-it-function-calling-GGUF).  

In addition to **safetensors**, the model is available in **GGUF** formats (in this case, you need to download only a single file (*[how to inference GGUF model](https://github.com/abetlen/llama-cpp-python?tab=readme-ov-file#high-level-api)*)):

| Filename | Quant type | File Size | Description |
| -------- | ---------- | --------- | ----------- |
| [gemma-2-9B-it-function-calling-F16.gguf](https://huggingface.co/DiTy/gemma-2-9b-it-function-calling-GGUF/blob/main/gemma-2-9B-it-function-calling-F16.gguf) | F16 | 18.5GB | Base model with float16 |
| [gemma-2-9B-it-function-calling-Q8_0.gguf](https://huggingface.co/DiTy/gemma-2-9b-it-function-calling-GGUF/blob/main/gemma-2-9B-it-function-calling-Q8_0.gguf) | Q8_0 | 9.83GB | Extremely high quality, generally unneeded but max available quant. |
| [gemma-2-9B-it-function-calling-Q6_K.gguf](https://huggingface.co/DiTy/gemma-2-9b-it-function-calling-GGUF/blob/main/gemma-2-9B-it-function-calling-Q6_K.gguf) | Q6_K | 7.59GB | Very high quality, near perfect, *recommended*. |
| [gemma-2-9B-it-function-calling-Q5_K_M.gguf](https://huggingface.co/DiTy/gemma-2-9b-it-function-calling-GGUF/blob/main/gemma-2-9B-it-function-calling-Q5_K_M.gguf) | Q5_K_M | 6.65GB | High quality, very usable. |
| [gemma-2-9B-it-function-calling-Q5_K_S.gguf](https://huggingface.co/DiTy/gemma-2-9b-it-function-calling-GGUF/blob/main/gemma-2-9B-it-function-calling-Q5_K_S.gguf) | Q5_K_S | 6.48GB | High quality, very usable. |


## Model card tree

* [How prepare your functions (tools) for *Function Calling*](#prepare_func_call)
* [Just use chat template for generation](#just_chat_template)
* [Prompt structure and expected content](#roles)
* [Evaluation of function calling models](#eval)

## Usage (HuggingFace Transformers)

Below we share some code snippets on how to get quickly started with running the model. First, install the Transformers library with:
```bash
pip install -U transformers
```

### <a name="prepare_func_call"></a>Prepare your functions for *Function Calling*

You should write the functions (tools) used by the model in *Python code* and make sure to add *Python docstrings* as in the example below:
```python
def get_weather(city: str):
    """
    A function that returns the weather in a given city.
    
    Args:
        city: The city to get the weather for.
    """
    import random
    
    return "sunny" if random.random() > 0.5 else "rainy"


def get_sunrise_sunset_times(city: str):
    """
    A function that returns the time of sunrise and sunset at the present moment, for a given city, in the form of a list: [sunrise_time, sunset_time].
    
    Args:
        city: The city to get the sunrise and sunset times for.
    """
    
    return ["6:00 AM", "6:00 PM"]
```

### <a name="just_chat_template"></a>Just use chat template

Next, you need to download the model and tokenizer:
```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "DiTy/gemma-2-9b-it-function-calling-GGUF",
    device_map="auto",
    torch_dtype=torch.bfloat16,  # use float16 or float32 if bfloat16 is not available to you.
    cache_dir=PATH_TO_MODEL_DIR,  # optional
)
tokenizer = AutoTokenizer.from_pretrained(
    "DiTy/gemma-2-9b-it-function-calling-GGUF",
    cache_dir=PATH_TO_MODEL_DIR,  # optional
)
```

To get the result of generation, just use `apply_chat_template`. In order to take into account our written functions (tools), 
we need to pass them as a list through the `tools` attribute and also use `add_prompt_generation=True`.
```python
history_messages = [
    {"role": "system", "content": "You are a helpful assistant with access to the following functions. Use them if required - "},
    {"role": "user", "content": "Hi, can you tell me the time of sunrise in Los Angeles?"},
]

inputs = tokenizer.apply_chat_template(
    history_messages,
    tokenize=False,
    add_generation_prompt=True,  # adding prompt for generation
    tools=[get_weather, get_sunrise_sunset_times],  # our functions (tools)
)

print(inputs)
```

Then our `inputs` will look like this:
```
<bos><start_of_turn>user
You are a helpful assistant with access to the following functions. Use them if required - {
    "name": "get_weather",
    "description": "A function that returns the weather in a given city.",
    "parameters": {
        "type": "object",
        "properties": {
            "city": {
                "type": "string",
                "description": "The city to get the weather for."
            }
        },
        "required": [
            "city"
        ]
    }
},
{
    "name": "get_sunrise_sunset_times",
    "description": "A function that returns the time of sunrise and sunset at the present moment, for a given city, in the form of a list: [sunrise_time, sunset_time].",
    "parameters": {
        "type": "object",
        "properties": {
            "city": {
                "type": "string",
                "description": "The city to get the sunrise and sunset times for."
            }
        },
        "required": [
            "city"
        ]
    }
}

Hi, can you tell me the time of sunrise in Los Angeles?<end_of_turn>
<start_of_turn>model

```

Now we can generate a model's response. 
Be careful because, after `apply_chat_template`, there is no need to *add special tokens* during tokenization. So, use `add_special_tokens=False`:
```python
terminator_ids = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<end_of_turn>"),
]

prompt_ids =  tokenizer.encode(inputs, add_special_tokens=False, return_tensors='pt').to(model.device)
generated_ids = model.generate(
    prompt_ids,
    max_new_tokens=512,
    eos_token_id=terminator_ids,
    bos_token_id=tokenizer.bos_token_id,
)
generated_response = tokenizer.decode(generated_ids[0][prompt_ids.shape[-1]:], skip_special_tokens=False)  # `skip_special_tokens=False` for debug

print(generated_response)
```

We get the generation as a function call:
```
Function call: {"name": "get_sunrise_sunset_times", "arguments": {"city": "Los Angeles"}}<end_of_turn>
```

Great, now we can pick up and process the results with our *called function*, and then provide the model with the *function's response*:
```python
history_messages = [
    {"role": "system", "content": "You are a helpful assistant with access to the following functions. Use them if required - "},
    {"role": "user", "content": "Hi, can you tell me the time of sunrise in Los Angeles?"},
    {"role": "function-call", "content": '{"name": "get_sunrise_sunset_times", "arguments": {"city": "Los Angeles"}}'},
    {"role": "function-response", "content": '{"times_list": ["6:00 AM", "6:00 PM"]}'},  # a hypothetical response from our function
]

inputs = tokenizer.apply_chat_template(
    history_messages,
    tokenize=False,
    add_generation_prompt=True,  # adding prompt for generation
    tools=[get_weather, get_sunrise_sunset_times],  # our functions (tools)
)

print(inputs)
```

Let's make sure the `inputs` are correct:
```
<bos><start_of_turn>user
You are a helpful assistant with access to the following functions. Use them if required - {
    "name": "get_weather",
    "description": "A function that returns the weather in a given city.",
    "parameters": {
        "type": "object",
        "properties": {
            "city": {
                "type": "string",
                "description": "The city to get the weather for."
            }
        },
        "required": [
            "city"
        ]
    }
},
{
    "name": "get_sunrise_sunset_times",
    "description": "A function that returns the time of sunrise and sunset at the present moment, for a given city, in the form of a list: [sunrise_time, sunset_time].",
    "parameters": {
        "type": "object",
        "properties": {
            "city": {
                "type": "string",
                "description": "The city to get the sunrise and sunset times for."
            }
        },
        "required": [
            "city"
        ]
    }
}

Hi, can you tell me the time of sunrise in Los Angeles?<end_of_turn>
<start_of_turn>model
Function call: {"name": "get_sunrise_sunset_times", "arguments": {"city": "Los Angeles"}}<end_of_turn>
<start_of_turn>user
Function response: {"times_list": ["6:00 AM", "6:00 PM"]}<end_of_turn>
<start_of_turn>model

```

Similarly, we generate a response from the model:
```python
prompt_ids =  tokenizer.encode(inputs, add_special_tokens=False, return_tensors='pt').to(model.device)
generated_ids = model.generate(
    prompt_ids,
    max_new_tokens=512,
    eos_token_id=terminator_ids,
    bos_token_id=tokenizer.bos_token_id,
)
generated_response = tokenizer.decode(generated_ids[0][prompt_ids.shape[-1]:], skip_special_tokens=False)  # `skip_special_tokens=False` for debug

print(generated_response)
```

As a result, we get the model's response:
```
The sunrise time in Los Angeles is 6:00 AM.<end_of_turn>
```

## Usage via transformers `pipeline`

<details>
  <summary>
  Generation via pipeline
  </summary>

```python
from transformers import pipeline


generation_pipeline = pipeline(
    "text-generation",
    model="DiTy/gemma-2-9b-it-function-calling-GGUF",
    model_kwargs={
        "torch_dtype": torch.bfloat16,  # use float16 or float32 if bfloat16 is not supported for you. 
        "cache_dir": PATH_TO_MODEL_DIR,  # OPTIONAL
    },
    device_map="auto",
)

history_messages = [
    {"role": "system", "content": "You are a helpful assistant with access to the following functions. Use them if required - "},
    {"role": "user", "content": "Hi, can you tell me the time of sunrise in Los Angeles?"},
    {"role": "function-call", "content": '{"name": "get_sunrise_sunset_times", "arguments": {"city": "Los Angeles"}}'},
    {"role": "function-response", "content": '{"times_list": ["6:00 AM", "6:00 PM"]}'},
]

inputs = generation_pipeline.tokenizer.apply_chat_template(
    history_messages,
    tokenize=False,
    add_generation_prompt=True,
    tools=[get_weather, get_sunrise_sunset_times],
)

terminator_ids = [
    generation_pipeline.tokenizer.eos_token_id,
    generation_pipeline.tokenizer.convert_tokens_to_ids("<end_of_turn>")
]

outputs = generation_pipeline(
    inputs,
    max_new_tokens=512,
    eos_token_id=terminator_ids,
)

print(outputs[0]["generated_text"][len(inputs):])
```
  
</details>

## <a name="roles"></a>Prompt structure and expected content

For the most correct operation of the model, it is assumed that `apply_chat_template` will be used. 
It is necessary to transmit the message history in a certain format.
```python
history_messages = [
    {"role": "...", "content": "..."},
    ...
]
```

The following roles are available for use:

* `system` - an optional role, its content is always placed at the very beginning and before listing the functions available to the model (tools).
You can always use the standard option that was used during the training: ***"You are a helpful assistant with access to the following functions. Use them if required - "***
* `user` - the user's request is transmitted through this role.
* `function-call` - The body of the function call is passed through this role.
Although the model is trained to generate a function call in the form of ***"Function call: {...}\<end_of_turn\>"***, you should still pass only the body ***"{...}"***
to the *"content"* field, since using `apply_chat_template`, the postscript in the instructions is added automatically.
* `function-response` - in this role, we must pass the response of our function in the *"content"* field as a dictionary ***'{"name_returnable_value": value}'***.
* `model` - the content under this role is considered to be the generated text of the model.

### Chat history with *Function Calling*

```
[
    {"role": "system", "content": "You are a helpful assistant with access to the following functions. Use them if required - "},
    {"role": "user", "content": "Hi, can you tell me the time of sunrise in Los Angeles?"},
    {"role": "function-call", "content": '{"name": "get_sunrise_sunset_times", "arguments": {"city": "Los Angeles"}}'},
    {"role": "function-response", "content": '{"times_list": ["6:00 AM", "6:00 PM"]}'},
]
```

It looks like:
```
<bos><start_of_turn>user
You are a helpful assistant with access to the following functions. Use them if required - {
    "name": "get_weather",
    "description": "A function that returns the weather in a given city.",
    "parameters": {
        "type": "object",
        "properties": {
            "city": {
                "type": "string",
                "description": "The city to get the weather for."
            }
        },
        "required": [
            "city"
        ]
    }
},
{
    "name": "get_sunrise_sunset_times",
    "description": "A function that returns the time of sunrise and sunset at the present moment, for a given city, in the form of a list: [sunrise_time, sunset_time].",
    "parameters": {
        "type": "object",
        "properties": {
            "city": {
                "type": "string",
                "description": "The city to get the sunrise and sunset times for."
            }
        },
        "required": [
            "city"
        ]
    }
}

Hi, can you tell me the time of sunrise in Los Angeles?<end_of_turn>
<start_of_turn>model
Function call: {"name": "get_sunrise_sunset_times", "arguments": {"city": "Los Angeles"}}<end_of_turn>
<start_of_turn>user
Function response: {"times_list": ["6:00 AM", "6:00 PM"]}<end_of_turn>
```


### Chat history with a standard user-model template

```
[
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "Tell me about California"},
]
```

It looks like:
```
<bos><start_of_turn>user
You are a helpful assistant

Tell me about California<end_of_turn>
```

## <a name="eval"></a>Evaluation

During the learning process, the validation error was approximated to the following values:

| **Model** | **Generation Language** | **Approximately Validation Loss** |
| :-----: | :-----: | :-----: |
| [DiTy/gemma-2-27b-it-function-calling-GGUF](https://huggingface.co/DiTy/gemma-2-27b-it-function-calling-GGUF) | EN | 0.47 |
| [DiTy/gemma-2-9b-it-russian-function-calling-GGUF](https://huggingface.co/DiTy/gemma-2-9b-it-russian-function-calling-GGUF) | RU | 0.57 |
| [**DiTy/gemma-2-9b-it-function-calling-GGUF**](https://huggingface.co/DiTy/gemma-2-9b-it-function-calling-GGUF) | **EN** | **0.5** |
| [DiTy/gemma-2-2b-it-function-calling](https://huggingface.co/DiTy/gemma-2-2b-it-function-calling) | EN | 0.66 |

## Citation

```none
@article{gemma_2024,
    title={Gemma},
    url={https://www.kaggle.com/m/3301},
    DOI={10.34740/KAGGLE/M/3301},
    publisher={Kaggle},
    author={Gemma Team},
    year={2024}
}
```