File size: 8,162 Bytes
7f46fa4 b03f95b 7f46fa4 b03f95b 7f46fa4 56eb3b8 b03f95b 36de3b1 56eb3b8 b03f95b 56eb3b8 7f46fa4 56eb3b8 b03f95b 56eb3b8 b03f95b 56eb3b8 36de3b1 56eb3b8 b03f95b 56eb3b8 36de3b1 56eb3b8 36de3b1 56eb3b8 b03f95b 56eb3b8 b03f95b 56eb3b8 36de3b1 56eb3b8 b03f95b 56eb3b8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 |
---
base_model: meta-llama/Llama-3.2-1B-Instruct
language:
- en
license: apache-2.0
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- gguf
---
# Uploaded model
- **Developed by:** soumitsr
- **License:** apache-2.0
- **Finetuned from model :** meta-llama/Llama-3.2-1B-Instruct
## Model Details
- **Base Model (and tokenizer)**: [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)
- **Context Window/Max Length**: 16384 tokens
- **Usage**: Instruction model fine-tuned for generating title, summary and extracting keywords from articles/blogs/posts in one shot. Ideal for backend volume processing of contents. I would NOT recommend it for chat.
### Input Prompt
I used the following prompt to train it so if you want the output to be similar, use this prompt.
```python
prompt_template = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>
response_format:json_object
<|eot_id|><|start_header_id|>user<|end_header_id|>
TASK: create title, summary and tags (e.g. company, organization, person, catastrophic event, product, process, security vulnerability, stock ticker symbol, geographic location). title should be 10 - 20 words, summary should be 100 - 200 words and tags (entities) should a string of comma separated phrases.
INPUT:
{text}
<|eot_id|><|start_header_id|>assistant<|end_header_id|>"""
```
### Response Format
The output will be a JSON object without any additional text or delimiter
```json
{
"title": "some 10 - 20 words title",
"summary": "some 100 - 180 word summary",
"tags": "comma separated list of named entities"
}
```
for example
```json
{
"title": "The Future of Space Missions: How 3D Printing is Revolutionizing Astronaut Logistics",
"summary": "The 3D printing market is poised for significant growth, with an estimated value of US$95 billion by 2032, according to BCG. While it may never replace traditional manufacturing on Earth, its potential in space is transformative. Astronauts aboard the International Space Station (ISS) manage complex logistics, relying on substantial deliveries of spare parts—over 7,000 pounds annually—with additional supplies stored on Earth and the ISS itself. However, this model is unsustainable for future manned missions to Mars and the Moon, where astronauts will face isolation and the need for adaptability. 3D printing offers a viable solution, enabling the in-situ production of parts and tools as needed, thus facilitating a new era of space exploration where self-sufficiency becomes essential for survival and success.",
"tags": "3D printing, space exploration, International Space Station, manufacturing, Mars, Moon, logistics, astronauts, spare parts, BCG"
}
```
The training dataset was designed to force it to produce JSON structure without any additional texts or delimiters. So langchain JSON parser will likely die because it looks for JSON format within a delimiter.
## Model Paths:
- Lora adapter (for Llama-3.2-1B-Instruct): https://huggingface.co/soumitsr/llama-v3p2-article-digestor-lora
- Merged 16-bit model: https://huggingface.co/soumitsr/llama-v3p2-article-digestor
- GGUFs for llama.cpp: https://huggingface.co/soumitsr/llama-v3p2-article-digestor-gguf
### Performance:
For an average of 1536 - 2048 input tokens it produces roughly 200 tokens (higher with lora adapter and lower using Q4_K_M)
- T4 using lora adapter in 4-bit: ~3.8 seconds
- T4 using merge 16-bit model: ~5.2 seconds
- A100 using lora adapter: <0.4 seconds
- CPU (4 cores) using Q4_K_M: 38-40 seconds
| Model | Quality and adherence rate |
| ---------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Merged model or Lora adapter | High quality content generation but lower adherence rate compared to the lower precision quantized models. 7-8 out of 2500 inputs will produce non-JSON output |
| Q8_0 | Same quality as the merged model. Better adherence rate to response format (1 out of 3000 inputs are non-JSON) |
| Q5_K_M | High quality, recommended. Similar to Q4 model. No visible difference. |
| Q4_K_M | High quality, recommended. Better adherence rate to response format (1 out of ~4000 inputs are non-JSON) but smaller summary (~100 words as opposed to 128 words) |
| Q2_K | Straight up trash. Don't use it. |
## Training Details
**Dataset**: [soumitsr/article-digests](https://huggingface.co/datasets/soumitsr/article-digests/viewer/default/train?p=255&row=25536) . This is generated using real news articles, blogs, reddit posts and yc-hackernews posts feed into Chat GPT-4o-mini for response.
Trained using Kaggle's free T4 GPU and unsloth. Here is the [Notebook](https://www.kaggle.com/code/soumitsalman/finetuning-llama-3-2-1b). On that note [Unsloth](https://unsloth.ai/) will change your life. To the creators of Unsloth: You are AWESOME! THANK YOU!
## Sample Code
### Prompt
```python
# this was the prompt template the model was trained with
prompt_template = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>
response_format:json_object
<|eot_id|><|start_header_id|>user<|end_header_id|>
TASK: create title, summary and tags (e.g. company, organization, person, catastrophic event, product, process, security vulnerability, stock ticker symbol, geographic location). title should be 10 - 20 words, summary should be 100 - 200 words and tags (entities) should a string of comma separated phrases.
INPUT:
{text}
<|eot_id|><|start_header_id|>assistant<|end_header_id|>"""
input_text = "whatever article, blog, post or novela you want to digest"
```
### Using Lora Adapter (Requires GPU)
```python
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "soumitsr/llama-v3p2-article-digestor-lora",
max_seq_length = 16384
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(prompt_template.format(text=input_text), return_tensors="pt")
# feel free to play with the max_new_tokens and temperature
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.1,
stream=False
)
resp = tokenizer.decode(outputs[0], skip_special_tokens=True))
response_json = json.loads(resp[resp.find('{'):resp.rfind('}')+1])
```
### Using Llama.CPP (No GPU)
Download one of the ggufs to a local directory and use that as a model path
```python
from llama_cpp import Llama
model = Llama(model_path=model_file_apth, n_ctx=16384, n_threads=os.cpu_count(), embedding=False, verbose=False)
resp = model.create_completion(
prompt=prompt_template.format(text=text),
max_tokens=384,
frequency_penalty=0.3, # feel free to play with these numbers
temperature=0.2
)['choices'][0]['text']
response_json = json.loads(resp[resp.find('{'):resp.rfind('}')+1])
```
## Appendix - Purpose of this model
I wanted a token efficient and cheap way to get quality summary, title and named-entities. The initial aim was to parse through volumes of click-bait garbage articles and blogs. When it comes to simpler tasks that are related to processing of given text ChatGPT is incredibly good at adhering to given instruction and response format. Llama-3.2-1b is powerful base model but it is inconsistent with sticking to response format and when it does, it produces a super generic content e.g. title that doesn't mean anything and the summary that is a one-lines BS. So I wanted to create something that will give me ChatGPT quality and consistency for basic tasks like summary, title and tag generation et. voila. |