tokutsu
/

llm-jp-3-13b-it

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

tokutsu commited on 4 days ago

Commit

8ef01ac

•

1 Parent(s): 918ffed

Update README

Files changed (1) hide show

README.md +6 -7

README.md CHANGED Viewed

@@ -21,7 +21,7 @@ datasets:
 ## Overview
-This is a fine-tuned [`llm-jp-3-13b-it`](https://huggingface.co/tokutsu/llm-jp-3-13b-it) model for [ELYZA-tasks-100](https://huggingface.co/datasets/elyza/ELYZA-tasks-100). The model was trained on ELYZA-tasks-100 and the [ichikara-instruction dataset](https://liat-aip.sakura.ne.jp/wp/llm%E3%81%AE%E3%81%9F%E3%82%81%E3%81%AE%E6%97%A5%E6%9C%AC%E8%AA%9E%E3%82%A4%E3%83%B3%E3%82%B9%E3%83%88%E3%83%A9%E3%82%AF%E3%82%B7%E3%83%A7%E3%83%B3%E3%83%87%E3%83%BC%E3%82%BF%E4%BD%9C%E6%88%90/).
 ## Usage
@@ -36,7 +36,6 @@ model, tokenizer = FastLanguageModel.from_pretrained(
     model_name=model_id,
     dtype=None,
     load_in_4bit=True,
-    trust_remote_code=True,
 )
 FastLanguageModel.for_inference(model)
@@ -47,11 +46,11 @@ prompt = """### 指示
 """
 inputs = tokenizer([prompt], return_tensors="pt").to(model.device)
-outputs = model(**inputs,
-                max_new_tokens=512,
-                use_cache=True,
-                do_sample=False,
-                repetition_penalty=1.2)
 prediction = tokenizer.decode(outputs[0], skip_special_tokens=True).split('\n### 回答')[-1]
 ```

 ## Overview
+This is a fine-tuned [llm-jp-3-13b-it](https://huggingface.co/tokutsu/llm-jp-3-13b-it) model for [ELYZA-tasks-100](https://huggingface.co/datasets/elyza/ELYZA-tasks-100). The model was trained on ELYZA-tasks-100 and the [ichikara-instruction dataset](https://liat-aip.sakura.ne.jp/wp/llm%E3%81%AE%E3%81%9F%E3%82%81%E3%81%AE%E6%97%A5%E6%9C%AC%E8%AA%9E%E3%82%A4%E3%83%B3%E3%82%B9%E3%83%88%E3%83%A9%E3%82%AF%E3%82%B7%E3%83%A7%E3%83%B3%E3%83%87%E3%83%BC%E3%82%BF%E4%BD%9C%E6%88%90/).
 ## Usage
     model_name=model_id,
     dtype=None,
     load_in_4bit=True,
 )
 FastLanguageModel.for_inference(model)
 """
 inputs = tokenizer([prompt], return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs,
+                         max_new_tokens=512,
+                         use_cache=True,
+                         do_sample=False,
+                         repetition_penalty=1.2)
 prediction = tokenizer.decode(outputs[0], skip_special_tokens=True).split('\n### 回答')[-1]
 ```