soumitsr commited on
Commit
b03f95b
·
verified ·
1 Parent(s): 36de3b1

minor syntax tweak

Browse files
Files changed (1) hide show
  1. README.md +12 -6
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- base_model: unsloth/llama-3.2-1b-instruct-bnb-4bit
3
  language:
4
  - en
5
  license: apache-2.0
@@ -15,14 +15,13 @@ tags:
15
 
16
  - **Developed by:** soumitsr
17
  - **License:** apache-2.0
18
- - **Finetuned from model :** unsloth/llama-3.2-1b-instruct-bnb-4bit
19
 
20
  ## Model Details
21
- **Base Model (and tokenizer)**: [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)
 
 
22
 
23
- **Context Window/Max Length**: 16384 tokens
24
-
25
- **Usage**: Instruction model fine-tuned for generating title, summary and extracting keywords from articles/blogs/posts in one shot. Ideal for backend volume processing of contents. I would NOT recommend it for chat.
26
  ### Input Prompt
27
  I used the following prompt to train it so if you want the output to be similar, use this prompt.
28
  ```python
@@ -34,6 +33,7 @@ INPUT:
34
  {text}
35
  <|eot_id|><|start_header_id|>assistant<|end_header_id|>"""
36
  ```
 
37
  ### Response Format
38
  The output will be a JSON object without any additional text or delimiter
39
  ```json
@@ -54,10 +54,12 @@ for example
54
  ```
55
 
56
  The training dataset was designed to force it to produce JSON structure without any additional texts or delimiters. So langchain JSON parser will likely die because it looks for JSON format within a delimiter.
 
57
  ## Model Paths:
58
  - Lora adapter (for Llama-3.2-1B-Instruct): https://huggingface.co/soumitsr/llama-v3p2-article-digestor-lora
59
  - Merged 16-bit model: https://huggingface.co/soumitsr/llama-v3p2-article-digestor
60
  - GGUFs for llama.cpp: https://huggingface.co/soumitsr/llama-v3p2-article-digestor-gguf
 
61
  ### Performance:
62
  For an average of 1536 - 2048 input tokens it produces roughly 200 tokens (higher with lora adapter and lower using Q4_K_M)
63
  - T4 using lora adapter in 4-bit: ~3.8 seconds
@@ -72,12 +74,14 @@ For an average of 1536 - 2048 input tokens it produces roughly 200 tokens (high
72
  | Q5_K_M | High quality, recommended. Similar to Q4 model. No visible difference. |
73
  | Q4_K_M | High quality, recommended. Better adherence rate to response format (1 out of ~4000 inputs are non-JSON) but smaller summary (~100 words as opposed to 128 words) |
74
  | Q2_K | Straight up trash. Don't use it. |
 
75
  ## Training Details
76
  **Dataset**: [soumitsr/article-digests](https://huggingface.co/datasets/soumitsr/article-digests/viewer/default/train?p=255&row=25536) . This is generated using real news articles, blogs, reddit posts and yc-hackernews posts feed into Chat GPT-4o-mini for response.
77
 
78
  Trained using Kaggle's free T4 GPU and unsloth. Here is the [Notebook](https://www.kaggle.com/code/soumitsalman/finetuning-llama-3-2-1b). On that note [Unsloth](https://unsloth.ai/) will change your life. To the creators of Unsloth: You are AWESOME! THANK YOU!
79
 
80
  ## Sample Code
 
81
  ### Prompt
82
  ```python
83
  # this was the prompt template the model was trained with
@@ -91,6 +95,7 @@ INPUT:
91
 
92
  input_text = "whatever article, blog, post or novela you want to digest"
93
  ```
 
94
  ### Using Lora Adapter (Requires GPU)
95
  ```python
96
  from unsloth import FastLanguageModel
@@ -131,5 +136,6 @@ resp = model.create_completion(
131
 
132
  response_json = json.loads(resp[resp.find('{'):resp.rfind('}')+1])
133
  ```
 
134
  ## Appendix - Purpose of this model
135
  I wanted a token efficient and cheap way to get quality summary, title and named-entities. The initial aim was to parse through volumes of click-bait garbage articles and blogs. When it comes to simpler tasks that are related to processing of given text ChatGPT is incredibly good at adhering to given instruction and response format. Llama-3.2-1b is powerful base model but it is inconsistent with sticking to response format and when it does, it produces a super generic content e.g. title that doesn't mean anything and the summary that is a one-lines BS. So I wanted to create something that will give me ChatGPT quality and consistency for basic tasks like summary, title and tag generation et. voila.
 
1
  ---
2
+ base_model: meta-llama/Llama-3.2-1B-Instruct
3
  language:
4
  - en
5
  license: apache-2.0
 
15
 
16
  - **Developed by:** soumitsr
17
  - **License:** apache-2.0
18
+ - **Finetuned from model :** meta-llama/Llama-3.2-1B-Instruct
19
 
20
  ## Model Details
21
+ - **Base Model (and tokenizer)**: [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)
22
+ - **Context Window/Max Length**: 16384 tokens
23
+ - **Usage**: Instruction model fine-tuned for generating title, summary and extracting keywords from articles/blogs/posts in one shot. Ideal for backend volume processing of contents. I would NOT recommend it for chat.
24
 
 
 
 
25
  ### Input Prompt
26
  I used the following prompt to train it so if you want the output to be similar, use this prompt.
27
  ```python
 
33
  {text}
34
  <|eot_id|><|start_header_id|>assistant<|end_header_id|>"""
35
  ```
36
+
37
  ### Response Format
38
  The output will be a JSON object without any additional text or delimiter
39
  ```json
 
54
  ```
55
 
56
  The training dataset was designed to force it to produce JSON structure without any additional texts or delimiters. So langchain JSON parser will likely die because it looks for JSON format within a delimiter.
57
+
58
  ## Model Paths:
59
  - Lora adapter (for Llama-3.2-1B-Instruct): https://huggingface.co/soumitsr/llama-v3p2-article-digestor-lora
60
  - Merged 16-bit model: https://huggingface.co/soumitsr/llama-v3p2-article-digestor
61
  - GGUFs for llama.cpp: https://huggingface.co/soumitsr/llama-v3p2-article-digestor-gguf
62
+
63
  ### Performance:
64
  For an average of 1536 - 2048 input tokens it produces roughly 200 tokens (higher with lora adapter and lower using Q4_K_M)
65
  - T4 using lora adapter in 4-bit: ~3.8 seconds
 
74
  | Q5_K_M | High quality, recommended. Similar to Q4 model. No visible difference. |
75
  | Q4_K_M | High quality, recommended. Better adherence rate to response format (1 out of ~4000 inputs are non-JSON) but smaller summary (~100 words as opposed to 128 words) |
76
  | Q2_K | Straight up trash. Don't use it. |
77
+
78
  ## Training Details
79
  **Dataset**: [soumitsr/article-digests](https://huggingface.co/datasets/soumitsr/article-digests/viewer/default/train?p=255&row=25536) . This is generated using real news articles, blogs, reddit posts and yc-hackernews posts feed into Chat GPT-4o-mini for response.
80
 
81
  Trained using Kaggle's free T4 GPU and unsloth. Here is the [Notebook](https://www.kaggle.com/code/soumitsalman/finetuning-llama-3-2-1b). On that note [Unsloth](https://unsloth.ai/) will change your life. To the creators of Unsloth: You are AWESOME! THANK YOU!
82
 
83
  ## Sample Code
84
+
85
  ### Prompt
86
  ```python
87
  # this was the prompt template the model was trained with
 
95
 
96
  input_text = "whatever article, blog, post or novela you want to digest"
97
  ```
98
+
99
  ### Using Lora Adapter (Requires GPU)
100
  ```python
101
  from unsloth import FastLanguageModel
 
136
 
137
  response_json = json.loads(resp[resp.find('{'):resp.rfind('}')+1])
138
  ```
139
+
140
  ## Appendix - Purpose of this model
141
  I wanted a token efficient and cheap way to get quality summary, title and named-entities. The initial aim was to parse through volumes of click-bait garbage articles and blogs. When it comes to simpler tasks that are related to processing of given text ChatGPT is incredibly good at adhering to given instruction and response format. Llama-3.2-1b is powerful base model but it is inconsistent with sticking to response format and when it does, it produces a super generic content e.g. title that doesn't mean anything and the summary that is a one-lines BS. So I wanted to create something that will give me ChatGPT quality and consistency for basic tasks like summary, title and tag generation et. voila.