--- base_model: - tokyotech-llm/Llama-3.1-Swallow-70B-v0.1 - meta-llama/Llama-3.1-70B - meta-llama/Llama-3.3-70B-Instruct library_name: transformers tags: - mergekit - merge - chat language: - ja - en pipeline_tag: text-generation license: llama3.3 --- # Llama-3.3-FakeSwallow-70B-Instruct-v0.1 🚨 **Only for research purpose. This model may have repetition issues.** This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit). - 2024.12.11 : The model weight updated. ## Test environment 🚨 **HACK: Try [oobabooga/text-generation-webui#5885](https://github.com/oobabooga/text-generation-webui/issues/5885) if multiple EOS tokens doesn't work.** This model was tested using [text-generation-webui](https://github.com/oobabooga/text-generation-webui/tree/main). I use preset `min_p` with temperature=1 for Generation. ## Usage This format must be adhered to strictly, as deviations may result in less optimal outputs from the model. The template used to construct a prompt for the instruct model is specified as follows: ``` <|begin_of_text|><|start_header_id|>system<|end_header_id|> {SYSTEM_PROMPT}<|eot_id|><|start_header_id|>user<|end_header_id|> {USER_MESSAGE}<|eot_id|><|start_header_id|>assistant<|end_header_id|> ``` For the "{SYSTEM_PROMPT}" part, We recommend using "あなたは誠実で優秀な日本人のアシスタントです。" or "You are a helpful assistant." For the "{USER_MESSAGE}" part, We recommend using {instruction}\n{input} In other words, We recommend the following: ``` <|begin_of_text|><|start_header_id|>system<|end_header_id|> あなたは誠実で優秀な日本人のアシスタントです。<|eot_id|><|start_header_id|>user<|end_header_id|> {instruction} {input}<|eot_id|><|start_header_id|>assistant<|end_header_id|> ``` ### Use the instruct model ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "nitky/Llama-3.3-FakeSwallow-70B-Instruct-v0.1" model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_name) prompt = "Give me a short introduction to large language model." messages = [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) generated_ids = model.generate( **model_inputs, max_new_tokens=512 ) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] ``` ## Merge Details ### Merge Method This model was merged using the [task arithmetic](https://arxiv.org/abs/2212.04089) merge method using [meta-llama/Llama-3.1-70B](https://huggingface.co/meta-llama/Llama-3.1-70B) as a base. ### Models Merged The following models were included in the merge: * [tokyotech-llm/Llama-3.1-Swallow-70B-v0.1](https://huggingface.co/tokyotech-llm/Llama-3.1-Swallow-70B-v0.1) * [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) ### Configuration The following YAML configuration was used to produce this model: ```yaml merge_method: task_arithmetic base_model: meta-llama/Llama-3.1-70B models: - model: tokyotech-llm/Llama-3.1-Swallow-70B-v0.1 parameters: weight: 1.0 - model: meta-llama/Llama-3.3-70B-Instruct parameters: weight: 0.998 dtype: bfloat16 name: Llama-3.3-FakeSwallow-70B-Instruct-v0.1 ```