File size: 10,990 Bytes
b1835fc e78137c b1835fc e78137c b1835fc e78137c b1835fc e78137c b1835fc e78137c b1835fc e78137c b1835fc 2646170 b1835fc af1d5fc b1835fc 2982508 bfb99ff 2982508 b1835fc 2982508 bfb99ff 2982508 e78137c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 |
---
language:
- ja
- en
license: other
library_name: transformers
tags:
- llama
- llama-2
- steerlm
datasets:
- OpenAssistant/oasst2
- nvidia/HelpSteer
base_model: karakuri-ai/karakuri-lm-70b-v0.1
pipeline_tag: conversational
model-index:
- name: karakuri-ai/karakuri-lm-70b-chat-v0.1
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: MT-Bench
type: unknown
metrics:
- type: unknown
value: 6.609375
name: score
- type: unknown
value: 6.43125
name: score
source:
url: https://huggingface.co/spaces/lmsys/mt-bench
- task:
type: text-generation
name: Text Generation
dataset:
name: AI2 Reasoning Challenge (25-Shot)
type: ai2_arc
config: ARC-Challenge
split: test
args:
num_few_shot: 25
metrics:
- type: acc_norm
value: 61.52
name: normalized accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=karakuri-ai/karakuri-lm-70b-chat-v0.1
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: HellaSwag (10-Shot)
type: hellaswag
split: validation
args:
num_few_shot: 10
metrics:
- type: acc_norm
value: 83.13
name: normalized accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=karakuri-ai/karakuri-lm-70b-chat-v0.1
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU (5-Shot)
type: cais/mmlu
config: all
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 59.35
name: accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=karakuri-ai/karakuri-lm-70b-chat-v0.1
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: TruthfulQA (0-shot)
type: truthful_qa
config: multiple_choice
split: validation
args:
num_few_shot: 0
metrics:
- type: mc2
value: 51.39
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=karakuri-ai/karakuri-lm-70b-chat-v0.1
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: Winogrande (5-shot)
type: winogrande
config: winogrande_xl
split: validation
args:
num_few_shot: 5
metrics:
- type: acc
value: 78.37
name: accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=karakuri-ai/karakuri-lm-70b-chat-v0.1
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GSM8k (5-shot)
type: gsm8k
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 40.41
name: accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=karakuri-ai/karakuri-lm-70b-chat-v0.1
name: Open LLM Leaderboard
---
# KARAKURI LM
![KARAKURI LM](./thumbnail.png)
KARAKURI LM is a pretrained language model that builds upon Llama 2.
Our model enhances Llama 2's capabilities by incorporating additional Japanese vocabulary and further pretraining on a mixture of Japanese and multilingual corpora.
KARAKURI LM Chat is a fine-tuned version of KARAKURI LM, which was trained on a mixture of publicly available and closed datasets using the [SteerLM](https://aclanthology.org/2023.findings-emnlp.754/) technique.
During fine-tuning, our model employed a continual learning approach.
Unlike the common practice of relying solely on structured conversational datasets, we also incorporated unstructured corpora, similar to what was used during its pretraining phase.
Despite the conversational datasets containing only 2.5% Japanese tokens, our model has shown remarkable performance.
It achieves the highest performance among Japanese open models on the [MT-Bench-jp](https://api.wandb.ai/links/wandb-japan/6ff86bp3) at the time of release.
Furthermore, it achieves performance comparable to Llama 2 70B Chat on the original English [MT-Bench](https://huggingface.co/spaces/lmsys/mt-bench).
You can find more details in our blog post ([en](https://medium.com/karakuri/introducing-karakuri-lm-34c79a3bf341), [ja](https://medium.com/karakuri/karakuri-lm%E3%81%AE%E8%A7%A3%E8%AA%AC-4b6cf9c3d40f)).
If you are curious about our model, give our [demo](https://lm.karakuri.cc/) a try.
## Model Details
- **Developed by**: [KARAKURI Inc.](https://about.karakuri.ai/)
- **Model type**: Causal decoder-only transformer language model
- **Languages**: English and Japanese
- **Finetuned from**: [karakuri-ai/karakuri-lm-70b-v0.1](https://huggingface.co/karakuri-ai/karakuri-lm-70b-v0.1)
- **Contact**: For questions and comments about the model, please email `karakuri-rd@karakuri.ai`
## Performance
At the time of release, KARAKURI LM 70B Chat v0.1 achieves the highest performance among Japanese open models on the [MT-Bench-jp](https://api.wandb.ai/links/wandb-japan/6ff86bp3):
| Model | Size | Alignment | MT-Bench-jp |
| :---------------------------------- | :-----: | :---------: | ----------: |
| GPT-4 | - | RLHF | 8.78 |
| GPT-3.5-Turbo | - | RLHF | 8.24 |
| Claude 2.1 | - | RLHF | 8.18 |
| Gemini Pro | - | RLHF | 7.17 |
| **KARAKURI LM 70B Chat v0.1** | **70B** | **SteerLM** | **6.43** |
| Qarasu-14B-Chat-Plus-Unleashed | 14B | SFT | 6.26 |
| Llama 2 70B Chat | 70B | RLHF | 5.23 |
| ELYZA-Japanese-Llama-2-13B | 13B | SFT | 5.05 |
| Japanese-StableLM-Instruct-Beta-70B | 70B | SFT | 5.03 |
| Swallow-70B-Instruct | 70B | SFT | 4.39 |
It also achieves performance comparable to Llama 2 70B Chat on the original English [MT-Bench](https://huggingface.co/spaces/lmsys/mt-bench):
| Model | Average | MT-Bench | MT-Bench-jp |
| :---------------------------- | -------: | -------: | ----------: |
| **KARAKURI LM 70B Chat v0.1** | **6.52** | **6.61** | **6.43** |
| Llama 2 70B Chat | 6.04 | 6.86 | 5.23 |
## Use in 🤗 Transformers
You can run the model using the `pipeline()` function from 🤗 Transformers:
```python
from transformers import pipeline, Conversation
chatbot = pipeline("conversational", model="karakuri-ai/karakuri-lm-70b-chat-v0.1", device_map="auto", torch_dtype="auto")
conversation = Conversation("週末に日帰りで東京に遊びに行こうと思っています。日帰りなので、短時間で回れるおすすめの観光プランを教えてください。")
conversation = chatbot(conversation, max_new_tokens=512)
conversation.messages[-1]["content"]
```
We use the following prompt template of multi-turn conversation in the Llama format, which includes an encoded string of multiple attribute values.
```python
messages = [
{"role": "system", "content": "System prompt"},
{"role": "user", "content": "User prompt"},
{"role": "assistant", "content": "Model response"},
{"role": "user", "content": "User prompt"},
]
chatbot.tokenizer.apply_chat_template(messages, tokenize=False)
# <s>[INST] <<SYS>>
# System prompt
# <</SYS>>
#
# User prompt [ATTR] helpfulness: 4 correctness: 4 coherence: 4 complexity: 4 verbosity: 4 quality: 4 toxicity: 0 humor: 0 creativity: 0 [/ATTR] [/INST] Model response </s><s>[INST] User prompt [ATTR] helpfulness: 4 correctness: 4 coherence: 4 complexity: 4 verbosity: 4 quality: 4 toxicity: 0 humor: 0 creativity: 0 [/ATTR] [/INST]
```
The prompt template contains nine attributes.
The first five are derived from HelpSteer, while the remaining four are derived from OASST2.
The values are represented by integers ranging from 0 to 4, with 0 being the lowest and 4 being the highest.
- helpfulness (default: 4)
- correctness (default: 4)
- coherence (default: 4)
- complexity (default: 4)
- verbosity (default: 4)
- quality (default: 4)
- toxicity (default: 0)
- humor (default: 0)
- creativity (default: 0)
You can change the attribute values by replacing the default values specified in the chat template:
```python
chatbot.tokenizer.chat_template = chatbot.tokenizer.chat_template.replace("complexity: 4", "complexity: 0")
```
## Training
### Training Datasets
- [OASST2](https://huggingface.co/datasets/OpenAssistant/oasst2)
- Our internal conversational datasets
### Training Infrastructure
- **Hardware**: KARAKURI LM 70B was trained on 32 nodes of an Amazon EC2 trn1.32xlarge instance.
- **Software**: We use code based on [neuronx-nemo-megatron](https://github.com/aws-neuron/neuronx-nemo-megatron).
## Acknowledgements
We gratefully acknowledge the support from AWS Japan through the [AWS LLM Development Support Program](https://aws.amazon.com/jp/local/llm-development-support-program/).
## License
Llama 2 is licensed under the LLAMA 2 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.
Subject to the license above, and except for commercial purposes, you are free to share and adapt KARAKURI LM, provided that you must, in a recognizable and appropriate manner, (i) state that you are using KARAKURI LM developed by KARAKURI Inc., when you publish or make available to third parties KARAKURI LM, its derivative works or modification, or any output or results of KARAKURI LM or its derivative works or modification, and (ii) indicate your contributions, if you modified any material of KARAKURI LM.
If you plan to use KARAKURI LM for commercial purposes, please contact us beforehand. You are not authorized to use KARAKURI LM for commercial purposes unless we expressly grant you such rights.
If you have any questions regarding the interpretation of above terms, please also feel free to contact us.
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_karakuri-ai__karakuri-lm-70b-chat-v0.1)
| Metric |Value|
|---------------------------------|----:|
|Avg. |62.36|
|AI2 Reasoning Challenge (25-Shot)|61.52|
|HellaSwag (10-Shot) |83.13|
|MMLU (5-Shot) |59.35|
|TruthfulQA (0-shot) |51.39|
|Winogrande (5-shot) |78.37|
|GSM8k (5-shot) |40.41|
|