|
--- |
|
license: cc-by-sa-4.0 |
|
datasets: |
|
- izumi-lab/llm-japanese-dataset-vanilla |
|
language: |
|
- ja |
|
tags: |
|
- gpt_neox |
|
- japanese |
|
- causal-lm |
|
--- |
|
|
|
This repo contains a low-rank adapter for [CALM](https://huggingface.co/cyberagent/open-calm-7b) |
|
fit on the dataset specially extracted from [llm-japanese-dataset](https://github.com/masanorihirano/llm-japanese-dataset). |
|
|
|
You can test this at https://huggingface.co/spaces/izumi-lab/stormy-7b-10ep |
|
|
|
This version of the weights was trained with the following hyperparameters: |
|
|
|
- Epochs: 10 |
|
- Batch size: 128 |
|
- Cutoff length: 300 |
|
- Learning rate: 3e-4 |
|
- Lora _r_: 4 |
|
- Lora target modules: query_key_value |
|
|
|
```python |
|
import torch |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
from peft import PeftModel |
|
|
|
base_model = "cyberagent/open-calm-7b" |
|
model = AutoModelForCausalLM.from_pretrained(base_model, torch_dtype=torch.float16) |
|
tokenizer = AutoTokenizer.from_pretrained(base_model) |
|
model = PeftModel.from_pretrained( |
|
model, |
|
"izumi-lab/stormy-7b-10ep", |
|
torch_dtype=torch.float16, |
|
) |
|
``` |
|
|
|
To see more latest information, please go to [llm.msuzuki.me](https://llm.msuzuki.me). |
|
|
|
## Details |
|
|
|
- Japanese Paper: |
|
- English Paper: |
|
- Website: [llm.msuzuki.me](https://llm.msuzuki.me). |
|
|
|
Citation: TBD |
|
|
|
If you have any inquiries, such as joint research, data provision, various types of support, please email izumi-llm@socsim.org . |