duyntnet
/

Llama-3.2-1B-Instruct-imatrix-GGUF

Text Generation

Llama-3.2-1B-Instruct

Model card Files Files and versions Community

Llama-3.2-1B-Instruct-imatrix-GGUF / README.md

duyntnet's picture

Upload README.md with huggingface_hub

b03f63a verified 3 months ago

|

history blame contribute delete

2.78 kB

	---
	license: other
	language:
	- en
	pipeline_tag: text-generation
	inference: false
	tags:
	- transformers
	- gguf
	- imatrix
	- Llama-3.2-1B-Instruct
	---
	Quantizations of https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct


	### Inference Clients/UIs
	* [llama.cpp](https://github.com/ggerganov/llama.cpp)
	* [KoboldCPP](https://github.com/LostRuins/koboldcpp)
	* [text-generation-webui](https://github.com/oobabooga/text-generation-webui)
	* [ollama](https://github.com/ollama/ollama)


	---

	# From original readme

	The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks.

	Model Developer: Meta

	Model Architecture: Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.

	## How to use

	This repository contains two versions of Llama-3.2-1B-Instruct, for use with `transformers` and with the original `llama` codebase.

	### Use with transformers

	Starting with `transformers >= 4.43.0` onward, you can run conversational inference using the Transformers `pipeline` abstraction or by leveraging the Auto classes with the `generate()` function.

	Make sure to update your transformers installation via `pip install --upgrade transformers`.

	```python
	import torch
	from transformers import pipeline

	model_id = "meta-llama/Llama-3.2-1B-Instruct"
	pipe = pipeline(
	"text-generation",
	model=model_id,
	torch_dtype=torch.bfloat16,
	device_map="auto",
	)
	messages = [
	{"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
	{"role": "user", "content": "Who are you?"},
	]
	outputs = pipe(
	messages,
	max_new_tokens=256,
	)
	print(outputs[0]["generated_text"][-1])
	```

	Note: You can also find detailed recipes on how to use the model locally, with `torch.compile()`, assisted generations, quantised and more at [`huggingface-llama-recipes`](https://github.com/huggingface/huggingface-llama-recipes)

	### Use with `llama`

	Please, follow the instructions in the [repository](https://github.com/meta-llama/llama)

	To download Original checkpoints, see the example command below leveraging `huggingface-cli`:

	```
	huggingface-cli download meta-llama/Llama-3.2-1B-Instruct --include "original/*" --local-dir Llama-3.2-1B-Instruct
	```