Neutralzz
/

BiLLa-7B-LLM

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

BiLLa-7B-LLM / README.md

Neutralzz's picture

Create README.md

e5b0cac over 1 year ago

|

history blame contribute delete

1.97 kB

	---
	license: apache-2.0
	---

	# BiLLa: A Bilingual LLaMA with Enhanced Reasoning Ability

	BiLLa is an open-source reasoning-enhanced bilingual LLaMA model. The main features are:
	- Greatly improve the ability of Chinese language modeling, and minimize the damage to the original English ability of LLaMA;
	- During the training, more task data is added with ChatGPT-generated analysis;
	- Full-parameter optimization for better performance.

	Github: https://github.com/Neutralzz/BiLLa

	<b>Note</b>: Due to LLaMA's license, the model weights in this hub cannot be used directly.
	The weight of `word embedding` is the sum of the weights of the trained model and the original LLaMA,
	so as to ensure that developers with LLaMA original model accessibility can convert the model released by this hub into a usable one.

	First, you can revert the model weights by [this script](https://github.com/Neutralzz/BiLLa/blob/main/embedding_convert.py):
	```shell
	python3 embedding_convert.py \
	--model_dir /path_to_BiLLa/BiLLa-7B-LLM \
	--meta_llama_pth_file /path_to_LLaMA/llama-7b/consolidated.00.pth
	```

	Then, you can run this model as follows:
	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForCausalLM
	model_path = "/path_to_BiLLa/BiLLa-7B-LLM"
	tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
	model = AutoModelForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True, torch_dtype=torch.float16).cuda()

	prompt = "[Your prompt]"
	input_ids = tokenizer([prompt]).input_ids
	output_ids = model.generate(
	torch.as_tensor(input_ids).cuda(),
	do_sample=True,
	temperature=0.7,
	max_new_tokens=1024
	)
	output_ids = output_ids[0][len(input_ids[0]):]

	outputs = tokenizer.decode(output_ids, skip_special_tokens=True).strip()
	print(outputs)
	```

	Different from [BiLLa-7B-SFT](https://huggingface.co/Neutralzz/BiLLa-7B-SFT), the input format of `BiLLa-7B-LLM` has no restriction.