soumitsr commited on
Commit
36de3b1
·
verified ·
1 Parent(s): 56eb3b8

minor syntax fixing

Browse files
Files changed (1) hide show
  1. README.md +7 -3
README.md CHANGED
@@ -19,7 +19,9 @@ tags:
19
 
20
  ## Model Details
21
  **Base Model (and tokenizer)**: [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)
 
22
  **Context Window/Max Length**: 16384 tokens
 
23
  **Usage**: Instruction model fine-tuned for generating title, summary and extracting keywords from articles/blogs/posts in one shot. Ideal for backend volume processing of contents. I would NOT recommend it for chat.
24
  ### Input Prompt
25
  I used the following prompt to train it so if you want the output to be similar, use this prompt.
@@ -66,13 +68,15 @@ For an average of 1536 - 2048 input tokens it produces roughly 200 tokens (high
66
  | Model | Quality and adherence rate |
67
  | ---------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- |
68
  | Merged model or Lora adapter | High quality content generation but lower adherence rate compared to the lower precision quantized models. 7-8 out of 2500 inputs will produce non-JSON output |
69
- | **Q8_0** | Same quality as the merged model. Better adherence rate to response format (1 out of 3000 inputs are non-JSON) |
70
- | **Q5_K_M** | High quality, recommended. Similar to Q4 model. No visible difference. |
71
  | Q4_K_M | High quality, recommended. Better adherence rate to response format (1 out of ~4000 inputs are non-JSON) but smaller summary (~100 words as opposed to 128 words) |
72
  | Q2_K | Straight up trash. Don't use it. |
73
  ## Training Details
74
  **Dataset**: [soumitsr/article-digests](https://huggingface.co/datasets/soumitsr/article-digests/viewer/default/train?p=255&row=25536) . This is generated using real news articles, blogs, reddit posts and yc-hackernews posts feed into Chat GPT-4o-mini for response.
 
75
  Trained using Kaggle's free T4 GPU and unsloth. Here is the [Notebook](https://www.kaggle.com/code/soumitsalman/finetuning-llama-3-2-1b). On that note [Unsloth](https://unsloth.ai/) will change your life. To the creators of Unsloth: You are AWESOME! THANK YOU!
 
76
  ## Sample Code
77
  ### Prompt
78
  ```python
@@ -110,7 +114,7 @@ resp = tokenizer.decode(outputs[0], skip_special_tokens=True))
110
  response_json = json.loads(resp[resp.find('{'):resp.rfind('}')+1])
111
  ```
112
 
113
- Using Llama.CPP (No GPU)
114
 
115
  Download one of the ggufs to a local directory and use that as a model path
116
  ```python
 
19
 
20
  ## Model Details
21
  **Base Model (and tokenizer)**: [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)
22
+
23
  **Context Window/Max Length**: 16384 tokens
24
+
25
  **Usage**: Instruction model fine-tuned for generating title, summary and extracting keywords from articles/blogs/posts in one shot. Ideal for backend volume processing of contents. I would NOT recommend it for chat.
26
  ### Input Prompt
27
  I used the following prompt to train it so if you want the output to be similar, use this prompt.
 
68
  | Model | Quality and adherence rate |
69
  | ---------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- |
70
  | Merged model or Lora adapter | High quality content generation but lower adherence rate compared to the lower precision quantized models. 7-8 out of 2500 inputs will produce non-JSON output |
71
+ | Q8_0 | Same quality as the merged model. Better adherence rate to response format (1 out of 3000 inputs are non-JSON) |
72
+ | Q5_K_M | High quality, recommended. Similar to Q4 model. No visible difference. |
73
  | Q4_K_M | High quality, recommended. Better adherence rate to response format (1 out of ~4000 inputs are non-JSON) but smaller summary (~100 words as opposed to 128 words) |
74
  | Q2_K | Straight up trash. Don't use it. |
75
  ## Training Details
76
  **Dataset**: [soumitsr/article-digests](https://huggingface.co/datasets/soumitsr/article-digests/viewer/default/train?p=255&row=25536) . This is generated using real news articles, blogs, reddit posts and yc-hackernews posts feed into Chat GPT-4o-mini for response.
77
+
78
  Trained using Kaggle's free T4 GPU and unsloth. Here is the [Notebook](https://www.kaggle.com/code/soumitsalman/finetuning-llama-3-2-1b). On that note [Unsloth](https://unsloth.ai/) will change your life. To the creators of Unsloth: You are AWESOME! THANK YOU!
79
+
80
  ## Sample Code
81
  ### Prompt
82
  ```python
 
114
  response_json = json.loads(resp[resp.find('{'):resp.rfind('}')+1])
115
  ```
116
 
117
+ ### Using Llama.CPP (No GPU)
118
 
119
  Download one of the ggufs to a local directory and use that as a model path
120
  ```python