nitky
/

Llama-3.3-FakeSwallow-70B-Instruct-v0.1

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Llama-3.3-FakeSwallow-70B-Instruct-v0.1 / README.md

nitky's picture

Upload 37 files

7c335fc verified 14 days ago

|

3.44 kB

	---
	base_model:
	- tokyotech-llm/Llama-3.1-Swallow-70B-v0.1
	- meta-llama/Llama-3.1-70B
	- meta-llama/Llama-3.3-70B-Instruct
	library_name: transformers
	tags:
	- mergekit
	- merge
	- chat
	language:
	- ja
	- en
	pipeline_tag: text-generation
	license: llama3.3
	---
	# Llama-3.3-FakeSwallow-70B-Instruct-v0.1

	This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).

	## Test environment

	This model was tested using [text-generation-webui](https://github.com/oobabooga/text-generation-webui/tree/main). I use preset `min_p` with temperature=1 for Generation.

	## Usage

	This format must be adhered to strictly, as deviations may result in less optimal outputs from the model.

	The template used to construct a prompt for the instruct model is specified as follows:

	```
	<\|begin_of_text\|><\|start_header_id\|>system<\|end_header_id\|>

	{SYSTEM_PROMPT}<\|eot_id\|><\|start_header_id\|>user<\|end_header_id\|>

	{USER_MESSAGE}<\|eot_id\|><\|start_header_id\|>assistant<\|end_header_id\|>

	```

	For the "{SYSTEM_PROMPT}" part, We recommend using "あなたは誠実で優秀な日本人のアシスタントです。" or "You are a helpful assistant."

	For the "{USER_MESSAGE}" part, We recommend using {instruction}\n{input}

	In other words, We recommend the following:

	```
	<\|begin_of_text\|><\|start_header_id\|>system<\|end_header_id\|>

	あなたは誠実で優秀な日本人のアシスタントです。<\|eot_id\|><\|start_header_id\|>user<\|end_header_id\|>
	{instruction}
	{input}<\|eot_id\|><\|start_header_id\|>assistant<\|end_header_id\|>

	```

	### Use the instruct model

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "nitky/Llama-3.3-FakeSwallow-70B-Instruct-v0.1"

	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype="auto",
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained(model_name)

	prompt = "Give me a short introduction to large language model."
	messages = [
	{"role": "system", "content": "You are a helpful assistant."},
	{"role": "user", "content": prompt}
	]
	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)
	model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

	generated_ids = model.generate(
	**model_inputs,
	max_new_tokens=512
	)
	generated_ids = [
	output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
	]

	response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

	```

	## Merge Details
	### Merge Method

	This model was merged using the [task arithmetic](https://arxiv.org/abs/2212.04089) merge method using [meta-llama/Llama-3.1-70B](https://huggingface.co/meta-llama/Llama-3.1-70B) as a base.

	### Models Merged

	The following models were included in the merge:
	* [tokyotech-llm/Llama-3.1-Swallow-70B-v0.1](https://huggingface.co/tokyotech-llm/Llama-3.1-Swallow-70B-v0.1)
	* [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct)

	### Configuration

	The following YAML configuration was used to produce this model:

	```yaml
	merge_method: task_arithmetic
	base_model: meta-llama/Llama-3.1-70B
	models:
	- model: tokyotech-llm/Llama-3.1-Swallow-70B-v0.1
	parameters:
	weight: 1.0
	- model: meta-llama/Llama-3.3-70B-Instruct
	parameters:
	weight: 0.8
	dtype: bfloat16
	name: Llama-3.3-FakeSwallow-70B-Instruct-v0.1
	```