minor syntax fixing
Browse files
README.md
CHANGED
@@ -19,7 +19,9 @@ tags:
|
|
19 |
|
20 |
## Model Details
|
21 |
**Base Model (and tokenizer)**: [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)
|
|
|
22 |
**Context Window/Max Length**: 16384 tokens
|
|
|
23 |
**Usage**: Instruction model fine-tuned for generating title, summary and extracting keywords from articles/blogs/posts in one shot. Ideal for backend volume processing of contents. I would NOT recommend it for chat.
|
24 |
### Input Prompt
|
25 |
I used the following prompt to train it so if you want the output to be similar, use this prompt.
|
@@ -66,13 +68,15 @@ For an average of 1536 - 2048 input tokens it produces roughly 200 tokens (high
|
|
66 |
| Model | Quality and adherence rate |
|
67 |
| ---------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
68 |
| Merged model or Lora adapter | High quality content generation but lower adherence rate compared to the lower precision quantized models. 7-8 out of 2500 inputs will produce non-JSON output |
|
69 |
-
|
|
70 |
-
|
|
71 |
| Q4_K_M | High quality, recommended. Better adherence rate to response format (1 out of ~4000 inputs are non-JSON) but smaller summary (~100 words as opposed to 128 words) |
|
72 |
| Q2_K | Straight up trash. Don't use it. |
|
73 |
## Training Details
|
74 |
**Dataset**: [soumitsr/article-digests](https://huggingface.co/datasets/soumitsr/article-digests/viewer/default/train?p=255&row=25536) . This is generated using real news articles, blogs, reddit posts and yc-hackernews posts feed into Chat GPT-4o-mini for response.
|
|
|
75 |
Trained using Kaggle's free T4 GPU and unsloth. Here is the [Notebook](https://www.kaggle.com/code/soumitsalman/finetuning-llama-3-2-1b). On that note [Unsloth](https://unsloth.ai/) will change your life. To the creators of Unsloth: You are AWESOME! THANK YOU!
|
|
|
76 |
## Sample Code
|
77 |
### Prompt
|
78 |
```python
|
@@ -110,7 +114,7 @@ resp = tokenizer.decode(outputs[0], skip_special_tokens=True))
|
|
110 |
response_json = json.loads(resp[resp.find('{'):resp.rfind('}')+1])
|
111 |
```
|
112 |
|
113 |
-
Using Llama.CPP (No GPU)
|
114 |
|
115 |
Download one of the ggufs to a local directory and use that as a model path
|
116 |
```python
|
|
|
19 |
|
20 |
## Model Details
|
21 |
**Base Model (and tokenizer)**: [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)
|
22 |
+
|
23 |
**Context Window/Max Length**: 16384 tokens
|
24 |
+
|
25 |
**Usage**: Instruction model fine-tuned for generating title, summary and extracting keywords from articles/blogs/posts in one shot. Ideal for backend volume processing of contents. I would NOT recommend it for chat.
|
26 |
### Input Prompt
|
27 |
I used the following prompt to train it so if you want the output to be similar, use this prompt.
|
|
|
68 |
| Model | Quality and adherence rate |
|
69 |
| ---------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
70 |
| Merged model or Lora adapter | High quality content generation but lower adherence rate compared to the lower precision quantized models. 7-8 out of 2500 inputs will produce non-JSON output |
|
71 |
+
| Q8_0 | Same quality as the merged model. Better adherence rate to response format (1 out of 3000 inputs are non-JSON) |
|
72 |
+
| Q5_K_M | High quality, recommended. Similar to Q4 model. No visible difference. |
|
73 |
| Q4_K_M | High quality, recommended. Better adherence rate to response format (1 out of ~4000 inputs are non-JSON) but smaller summary (~100 words as opposed to 128 words) |
|
74 |
| Q2_K | Straight up trash. Don't use it. |
|
75 |
## Training Details
|
76 |
**Dataset**: [soumitsr/article-digests](https://huggingface.co/datasets/soumitsr/article-digests/viewer/default/train?p=255&row=25536) . This is generated using real news articles, blogs, reddit posts and yc-hackernews posts feed into Chat GPT-4o-mini for response.
|
77 |
+
|
78 |
Trained using Kaggle's free T4 GPU and unsloth. Here is the [Notebook](https://www.kaggle.com/code/soumitsalman/finetuning-llama-3-2-1b). On that note [Unsloth](https://unsloth.ai/) will change your life. To the creators of Unsloth: You are AWESOME! THANK YOU!
|
79 |
+
|
80 |
## Sample Code
|
81 |
### Prompt
|
82 |
```python
|
|
|
114 |
response_json = json.loads(resp[resp.find('{'):resp.rfind('}')+1])
|
115 |
```
|
116 |
|
117 |
+
### Using Llama.CPP (No GPU)
|
118 |
|
119 |
Download one of the ggufs to a local directory and use that as a model path
|
120 |
```python
|