dahara1 commited on
Commit
0a910cf
·
verified ·
1 Parent(s): c7f1884

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +126 -0
README.md ADDED
@@ -0,0 +1,126 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - ja
4
+ ---
5
+
6
+ ## 本モデルについて about this model.
7
+ [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)を[日本語が多く含まれる重要度行列(iMatrix)](dahara1/imatrix-jpn-test)を使って量子化し、長文(128K)要約を可能にしたgguf版です。日本語対応能力が多めに保持されている事を期待しています。
8
+ This is a gguf version of [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) that has been quantized using [importance matrix (iMatrix) that contains a lot of Japanese](dahara1/imatrix-jpn-test) to enable summarization of long texts (128K). We hope that it retains a large amount of Japanese support.
9
+
10
+ 少なくともQwen2.5-3B-Instruct-gguf-japanese-imatrix-128K/Qwen2.5-3B-Instruct-Q8_0-f16.ggufが32Kトークンを超える超長文を正しく要約できる事を確認済です。
11
+ It has been confirmed that at least Qwen2.5-3B-Instruct-gguf-japanese-imatrix-128K/Qwen2.5-3B-Instruct-Q8_0-f16.gguf can correctly summarize extremely long texts exceeding 32K tokens.
12
+
13
+ 128Kコンテキスト延長については[unsloth/Qwen2.5-Coder-32B-Instruct-128K-GGUF]の指摘を参考にしています。ありがとう。
14
+ Regarding the 128K context extension, I have taken note of the suggestion made by [unsloth/Qwen2.5-Coder-32B-Instruct-128K-GGUF]. Thank you.
15
+
16
+
17
+ ## For ollama users
18
+ ollama ユーザーは[FAQ](https://github.com/ollama/ollama/blob/main/docs/faq.md)を参考にしてcontext window sizeパラメーターを修正してください。
19
+ If you use ollama, check [FAQ](https://github.com/ollama/ollama/blob/main/docs/faq.md) and set context window size parameter like below.
20
+
21
+ ```
22
+ /set parameter num_ctx 40960
23
+ ```
24
+ or API
25
+ ```
26
+ curl http://localhost:11434/api/generate -d '{
27
+ "model": "llama3.2",
28
+ "prompt": "Why is the sky blue?",
29
+ "options": {
30
+ "num_ctx": 40960
31
+ }
32
+ }'
33
+ ```
34
+
35
+ あなたが他のツールを使っている場合、同様にあなたの使っているツールのマニュアルを調べて、コンテキストウインドウサイズを延長する事を忘れないでください
36
+ ただし、コンテキストサイズを必要以上に大きくするとモデルの実行速度が低下するので注意してください
37
+ 本モデルは理論上、最大値131072に設定できますが、実行速度と品質に影響が出る事が考えられます
38
+
39
+ If you are using other tools, be sure to extend the context window size as well, by consulting the manual of your tool.
40
+ But please note that increasing the context window size more than necessary will slow down the model's execution speed.
41
+ In theory, this model can be set to the maximum value of 131072, but this may affect execution speed and quality.
42
+
43
+
44
+ ## Sample llama.cpp script
45
+
46
+ 以下は、Wikipediaの約50000文字の記事を取得して内容を要約するサンプルです
47
+ Below is a sample that retrieves a Wikipedia article of about 50,000 characters and summarizes its contents.
48
+
49
+
50
+ llama.cpp server command sample.
51
+ ```
52
+ ./llama.cpp/build/bin/Release/llama-server.exe -m ./Qwen2.5-3B-Instruct-Q8_0-f16.gguf -c 40960
53
+ ```
54
+
55
+
56
+ llama.cpp client script sample.
57
+ ```
58
+ import transformers
59
+ import requests
60
+ import json
61
+ from transformers import AutoTokenizer
62
+ tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-3B-Instruct")
63
+
64
+ url = "https://ja.wikipedia.org/wiki/%E7%94%B7%E3%81%AE%E5%A8%98"
65
+
66
+ def get_wikipedia_text(url):
67
+ response = requests.get(url)
68
+ if response.status_code == 200:
69
+ from bs4 import BeautifulSoup
70
+ soup = BeautifulSoup(response.text, 'html.parser')
71
+ paragraphs = soup.find_all('p')
72
+ text = "\n".join([p.get_text() for p in paragraphs])
73
+ return text
74
+ else:
75
+ raise Exception(f"Failed to fetch the article. Status code: {response.status_code}")
76
+
77
+ if __name__ == "__main__":
78
+
79
+ html_text = get_wikipedia_text(url)
80
+ #html_text = html_text[:40000]
81
+
82
+ instruct = "### 指示\n\n上記の文章を日本語で3行で要約してください"
83
+
84
+ # instruct first version
85
+ messages = [
86
+ {"role": "system", "content": "You are a helpful assistant."},
87
+ {"role": "user", "content": instruct + "\n\n" + html_text},
88
+ ]
89
+
90
+ # instruct last version
91
+ messages = [
92
+ {"role": "system", "content": "You are a helpful assistant."},
93
+ {"role": "user", "content": html_text + "\n\n" + instruct},
94
+ ]
95
+
96
+ prompt = tokenizer.apply_chat_template(
97
+ messages,
98
+ add_generation_prompt=True,
99
+ tokenize=False
100
+ )
101
+ print(prompt)
102
+
103
+ payload = {
104
+ "prompt": prompt,
105
+ "n_predict": 512
106
+ }
107
+
108
+ url = "http://localhost:8080/completion"
109
+ headers = {
110
+ "Content-Type": "application/json"
111
+ }
112
+
113
+ response = requests.post(url, headers=headers, data=json.dumps(payload))
114
+ if response.status_code != 200:
115
+ print(f"Error: {response.text}")
116
+
117
+ response_data = response.json()
118
+
119
+ response_content = response_data.get('content', '').strip()
120
+ print(response_content)
121
+ ```
122
+
123
+ #### 出力結果(output sample)
124
+
125
+
126
+