File size: 1,640 Bytes
5d18f5a c0e1d1c 3b09ef1 5d18f5a 3b09ef1 5d18f5a 3b09ef1 5d18f5a 3b09ef1 5d18f5a 6e9e017 3b09ef1 5d18f5a 3b09ef1 5d18f5a 3b09ef1 5d18f5a 3b09ef1 5d18f5a 3b09ef1 5d18f5a 3b09ef1 5d18f5a 3b09ef1 5d18f5a 3b09ef1 5d18f5a 3b09ef1 5d18f5a 3b09ef1 5d18f5a 3b09ef1 5d18f5a 3b09ef1 5d18f5a 4fdb20b 5d18f5a 4fdb20b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
---
library_name: transformers
datasets:
- elyza/ELYZA-tasks-100
license: apache-2.0
language:
- ja
base_model:
- llm-jp/llm-jp-3-13b-instruct
---
# Model Card for Model ID
<!-- Provide a quick summary of what the model is/does. -->
## Required Libraries and Their Versions
- trl==0.12.2
- transformers<4.47.0
- tokenizers==0.21.0
## Usage
```py
results = []
system_text = "以下は、タスクを説明する指示です。要求を適切に満たす回答を**簡潔に**書きなさい。"
for data in tqdm(datasets):
input_text = data["input"]
prompt = f"""
{system_text}
### 指示
{input_text}
### 応答
"""
tokenized_input = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt").to(model.device)
attention_mask = torch.ones_like(tokenized_input)
with torch.no_grad():
outputs = model.generate(
tokenized_input,
attention_mask=attention_mask,
max_new_tokens=100,
do_sample=False,
repetition_penalty=1.2,
pad_token_id=tokenizer.eos_token_id
)[0]
output = tokenizer.decode(outputs[tokenized_input.size(1):], skip_special_tokens=True)
results.append({"task_id": data["task_id"], "input": input_text, "output": output})
```
## Model Details
- **Model type:** Transformer-based Language Model
## Datasets
### Instruction tuning
| Language | Dataset | description |
|:---|:---|:---|
|Japanese|[elyza/ELYZA-tasks-100](https://huggingface.co/datasets/elyza/ELYZA-tasks-100)| A manually constructed instruction dataset |
## License
[Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
|