YokaiKoibito
/

llama2_70b_chat_uncensored-fp16

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

llama2_70b_chat_uncensored-fp16 / README.md

YokaiKoibito's picture

Update README.md

a94f6a8 over 1 year ago

|

1.61 kB

	---
	license: llama2
	datasets:
	- ehartford/wizard_vicuna_70k_unfiltered
	tags:
	- uncensored
	- wizard
	- vicuna
	- llama
	---
	This is an fp16 copy of [jarradh/llama2_70b_chat_uncensored](https://huggingface.co/jarradh/llama2_70b_chat_uncensored) for faster downloading and less disk space usage than the fp32 original. I simply imported the model to CPU with torch_dtype=torch.float16 and then exported it again. All credit for the model goes to [jarradh](https://huggingface.co/jarradh).

	Arguable a better name for this model would be something like Llama-2-70B_Wizard-Vicuna-Uncensored-fp16, but to avoid confusion I'm sticking with jarradh's naming scheme.

	<!-- repositories-available start -->
	## Repositories available

	* [GPTQ models for GPU inference, with multiple quantisation parameter options.](https://huggingface.co/TheBloke/llama2_70b_chat_uncensored-GPTQ)
	* [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/llama2_70b_chat_uncensored-GGML)
	* [2, 3, 4, 5, 6 and 8-bit GGUF models for CPU+GPU inference, plus fp16 GGUF for requantizing](https://huggingface.co/TheBloke/YokaiKoibito/WizardLM-Uncensored-Falcon-40B-GGUF)
	* [Jarrad Hope's unquantised model in fp16 pytorch format, for GPU inference and further conversions](https://huggingface.co/YokaiKoibito/llama2_70b_chat_uncensored-fp16)
	* [Jarrad Hope's original unquantised fp32 model in pytorch format, for further conversions](https://huggingface.co/jarradh/llama2_70b_chat_uncensored)

	<!-- repositories-available end -->

	## Prompt template: Human-Response

	```
	### HUMAN:
	{prompt}

	### RESPONSE:
	```