Instructions to use meta-llama/Meta-Llama-3-8B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use meta-llama/Meta-Llama-3-8B-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="meta-llama/Meta-Llama-3-8B-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
HuggingChat
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use meta-llama/Meta-Llama-3-8B-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "meta-llama/Meta-Llama-3-8B-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "meta-llama/Meta-Llama-3-8B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/meta-llama/Meta-Llama-3-8B-Instruct

SGLang

How to use meta-llama/Meta-Llama-3-8B-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "meta-llama/Meta-Llama-3-8B-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "meta-llama/Meta-Llama-3-8B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "meta-llama/Meta-Llama-3-8B-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "meta-llama/Meta-Llama-3-8B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use meta-llama/Meta-Llama-3-8B-Instruct with Docker Model Runner:
```
docker model run hf.co/meta-llama/Meta-Llama-3-8B-Instruct
```

request to access is still pending a review

#50

by Hoo1196 - opened Apr 22, 2024

Discussion

Hoo1196

Apr 22, 2024

The access request has still not been approved after three days.

CocoSun

Apr 22, 2024

Same issue

Hoo1196 changed discussion title from request to access is stiil pending a review to request to access is still pending a review Apr 22, 2024

hylll

Apr 22, 2024

I also have the same issue.

Sedrick99

Apr 22, 2024

same issue for me

JingweiNi

Apr 22, 2024

Same issue here

Alarmist

Apr 22, 2024

same issue

yushuihao

Apr 22, 2024

Same issue here

Alviner

Apr 22, 2024

I also have the same issue.

bartk

Apr 22, 2024

omarkhaled

Apr 22, 2024

Same issue here

Wo-o

Apr 22, 2024

same issue here

testcosmicpre

Apr 22, 2024

averoo

Apr 22, 2024

alexrods

Apr 22, 2024

If you use personal email they don't give you access, try with a business or institutional email. Same with HF requests.

averoo

Apr 23, 2024

•

edited Apr 23, 2024

Access was granted in three days. I've use personal email.

In addition, you can download the weights and tokenizer using the download.sh from the official llama3 GitHub page.

Apr 23, 2024

Apr 23, 2024

Apr 23, 2024

Apr 23, 2024

Apr 23, 2024

Apr 24, 2024

Apr 24, 2024

Apr 24, 2024

Got my access! Thanks

xyzw-io

Apr 24, 2024

Submitted my request last Thursday. Still waiting for approval.

albertmu

Apr 25, 2024

so am I

Franklin81

Apr 25, 2024

Hannibal046

Apr 25, 2024

same here

micrem73

Oct 25, 2024

same here

zisisbatzos

Nov 1, 2024

not accepted for 2 weeks, what's going on?

micrem73

Nov 4, 2024

I also submitted the request two weeks ago. There is no acceptance yet. That is disappointing.

ppstdg

Nov 6, 2024

i also not getting approve with the request. anyone has suggestion, pls?

micrem73

Nov 7, 2024

Solved.
In my HF account setting, I changed my primary email address. Before, it was a gmail.com address. Now, it is an academic email address. I have no idea if this is related. But a few minutes later, I was notified that my request was approved. I hope this can help others with the same issue.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment