ctranslate2-4you
/

Mistral-Nemo-Instruct-2407-ct2-int8

Model card Files Files and versions Community

Mistral-Nemo-Instruct-2407-ct2-int8 / README.md

ctranslate2-4you's picture

ctranslate2-4you

Create README.md

bb1d05d verified about 1 month ago

|

history blame contribute delete

2.13 kB

	---
	base_model:
	- mistralai/Mistral-Nemo-Instruct-2407
	---

	Ctranslate2 conversion of the model located at [mistralai/Mistral-Nemo-Instruct-2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407)

	Conversion script with graphical user interface can be downloaded [HERE](https://github.com/BBC-Esq/Ctranslate2-Converter)

	## Tested with Ctranslate 4.4.0 and Torch 2.2.2
	- NOTE: Ctranslate2 will soon release version 4.5.0, which will require greater than Torch 2.2.2.

	## Example Usage:

	```
	import os
	import sys
	import ctranslate2
	import gc
	import torch
	from transformers import AutoTokenizer

	system_message = "You are a helpful person who answers questions."
	user_message = "Hello, how are you today? I'd like you to write me a funny poem that is a parody of Milton's Paradise Lost if you are familiar with that famous epic poem?"

	model_dir = r"D:\Scripts\bench_chat\models\mistralai--Mistral-Nemo-Instruct-2407-ct2-int8"


	def build_prompt_mistral_nemo():
	prompt = f"""<s>
	[INST]{system_message}

	{user_message}[/INST]"""

	return prompt


	def main():
	model_name = os.path.basename(model_dir)

	print(f"\033[32mLoading the model: {model_name}...\033[0m")

	intra_threads = max(os.cpu_count() - 4, 4)

	generator = ctranslate2.Generator(
	model_dir,
	device="cuda",
	compute_type="int8",
	intra_threads=intra_threads
	)

	tokenizer = AutoTokenizer.from_pretrained(model_dir, add_prefix_space=None)

	prompt = build_prompt_mistral_nemo()

	tokens = tokenizer.convert_ids_to_tokens(tokenizer.encode(prompt))

	results_batch = generator.generate_batch(
	[tokens],
	include_prompt_in_result=False,
	max_batch_size=4096,
	batch_type="tokens",
	beam_size=1,
	num_hypotheses=1,
	max_length=512,
	sampling_temperature=0.0,
	)

	output = tokenizer.decode(results_batch[0].sequences_ids[0])

	print("\nGenerated response:")
	print(output)

	del generator
	del tokenizer
	torch.cuda.empty_cache()
	gc.collect()


	if __name__ == "__main__":
	main()
	```