dahara1
/

Qwen2.5-3B-Instruct-gguf-japanese-imatrix-128K

GGUF

Japanese

Inference Endpoints

conversational

Model card Files Files and versions Community

dahara1 commited on 11 days ago

Commit

413d3ce

•

1 Parent(s): 0a910cf

Update README.md

Browse files

Files changed (1) hide show

README.md +11 -11

README.md CHANGED Viewed

@@ -4,14 +4,14 @@ language:
 ---
 ## 本モデルについて about this model.
-[Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)を[日本語が多く含まれる重要度行列(iMatrix)](dahara1/imatrix-jpn-test)を使って量子化し、長文(128K)要約を可能にしたgguf版です。日本語対応能力が多めに保持されている事を期待しています。
-This is a gguf version of [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) that has been quantized using [importance matrix (iMatrix) that contains a lot of Japanese](dahara1/imatrix-jpn-test) to enable summarization of long texts (128K). We hope that it retains a large amount of Japanese support.
 少なくともQwen2.5-3B-Instruct-gguf-japanese-imatrix-128K/Qwen2.5-3B-Instruct-Q8_0-f16.ggufが32Kトークンを超える超長文を正しく要約できる事を確認済です。
 It has been confirmed that at least Qwen2.5-3B-Instruct-gguf-japanese-imatrix-128K/Qwen2.5-3B-Instruct-Q8_0-f16.gguf can correctly summarize extremely long texts exceeding 32K tokens.
-128Kコンテキスト延長については[unsloth/Qwen2.5-Coder-32B-Instruct-128K-GGUF]の指摘を参考にしています。ありがとう。
-Regarding the 128K context extension, I have taken note of the suggestion made by [unsloth/Qwen2.5-Coder-32B-Instruct-128K-GGUF]. Thank you.
 ## For ollama users
@@ -23,9 +23,9 @@ If you use ollama, check [FAQ](https://github.com/ollama/ollama/blob/main/docs/f
 ```
 or API
 ```
-curl http://localhost:11434/api/generate -d '{
-  "model": "llama3.2",
-  "prompt": "Why is the sky blue?",
   "options": {
     "num_ctx": 40960
   }
@@ -34,17 +34,17 @@ curl http://localhost:11434/api/generate -d '{
 あなたが他のツールを使っている場合、同様にあなたの使っているツールのマニュアルを調べて、コンテキストウインドウサイズを延長する事を忘れないでください
 ただし、コンテキストサイズを必要以上に大きくするとモデルの実行速度が低下するので注意してください
-本モデルは理論上、最大値131072に設定できますが、実行速度と品質に影響が出る事が考えられます
 If you are using other tools, be sure to extend the context window size as well, by consulting the manual of your tool.
 But please note that increasing the context window size more than necessary will slow down the model's execution speed.
-In theory, this model can be set to the maximum value of 131072, but this may affect execution speed and quality.
 ## Sample llama.cpp script
-以下は、Wikipediaの約50000文字の記事を取得して内容を要約するサンプルです
-Below is a sample that retrieves a Wikipedia article of about 50,000 characters and summarizes its contents.
 llama.cpp server command sample.

 ---
 ## 本モデルについて about this model.
+[Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)を[日本語が多く含まれる重要度行列(iMatrix)](dahara1/imatrix-jpn-test)を使って量子化し、超長文(32K以上)要約を可能にしたgguf版です。日本語対応能力が多めに保持されている事を期待しています。
+This is a gguf version of [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) that has been quantized using [importance matrix (iMatrix) that contains a lot of Japanese](dahara1/imatrix-jpn-test) to enable summarization of long texts (over 32K). We hope that it retains a large amount of Japanese support.
 少なくともQwen2.5-3B-Instruct-gguf-japanese-imatrix-128K/Qwen2.5-3B-Instruct-Q8_0-f16.ggufが32Kトークンを超える超長文を正しく要約できる事を確認済です。
 It has been confirmed that at least Qwen2.5-3B-Instruct-gguf-japanese-imatrix-128K/Qwen2.5-3B-Instruct-Q8_0-f16.gguf can correctly summarize extremely long texts exceeding 32K tokens.
+128Kコンテキスト延長については[unsloth/Qwen2.5-Coder-32B-Instruct-128K-GGUF](https://huggingface.co/unsloth/Qwen2.5-Coder-32B-Instruct-128K-GGUF)の指摘を参考にしています。ありがとう。
+Regarding the 128K context extension, I have taken note of the suggestion made by [unsloth/Qwen2.5-Coder-32B-Instruct-128K-GGUF](https://huggingface.co/unsloth/Qwen2.5-Coder-32B-Instruct-128K-GGUF). Thank you.
 ## For ollama users
 ```
 or API
 ```
+curl http://..../api/generate -d '{
+  "model": ".....",
+  "prompt": "......",
   "options": {
     "num_ctx": 40960
   }
 あなたが他のツールを使っている場合、同様にあなたの使っているツールのマニュアルを調べて、コンテキストウインドウサイズを延長する事を忘れないでください
 ただし、コンテキストサイズを必要以上に大きくするとモデルの実行速度が低下するので注意してください
+本モデルは理論上、最大値128K(131072)に設定できますが、実行速度と品質に影響が出る事が考えられます
 If you are using other tools, be sure to extend the context window size as well, by consulting the manual of your tool.
 But please note that increasing the context window size more than necessary will slow down the model's execution speed.
+In theory, this model can be set to the maximum value of 128K(131072), but this may affect execution speed and quality.
 ## Sample llama.cpp script
+以下は、Wikipediaの約50,000文字(34.8Kトークン)の記事を取得して内容を要約するサンプルです
+Below is a sample that retrieves a Wikipedia article of about 50,000 Japanese characters(34.8K tokens) and summarizes its contents.
 llama.cpp server command sample.