Instructions to use PygmalionAI/pygmalion-6b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use PygmalionAI/pygmalion-6b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="PygmalionAI/pygmalion-6b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("PygmalionAI/pygmalion-6b")
model = AutoModelForCausalLM.from_pretrained("PygmalionAI/pygmalion-6b")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use PygmalionAI/pygmalion-6b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "PygmalionAI/pygmalion-6b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PygmalionAI/pygmalion-6b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/PygmalionAI/pygmalion-6b

SGLang

How to use PygmalionAI/pygmalion-6b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "PygmalionAI/pygmalion-6b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PygmalionAI/pygmalion-6b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "PygmalionAI/pygmalion-6b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PygmalionAI/pygmalion-6b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use PygmalionAI/pygmalion-6b with Docker Model Runner:
```
docker model run hf.co/PygmalionAI/pygmalion-6b
```

The bot can't differentiate between commenting the situation or talking to the bot

by Szarka - opened Jan 19, 2023

Discussion

Szarka

Jan 19, 2023

Example:
Me: I eat a popcorn while we watch the movie
Bot: Sure you can eat a popcorn while we watch the movie

I mean it is not a big problem, just providing feedback. The model and colab works very well! I hope you will improve it more. Thank you for your work.

Delcos

Jan 19, 2023

•

edited Jan 19, 2023

add * to the start and end without spcaes. This seems to work every time i've used it in testing, and again beat out OPT on recognizing the YOUR TEXT HERE PLS as an action.

I eat popcorn this tastes great.

Szarka

Jan 20, 2023

Tried it. Sometimes it works yeah thanks. The later part of your sentence i dont understand. Is it an configuration option in the gradio app?

Delcos

Jan 21, 2023

•

edited Jan 21, 2023

Oh, sorry. I haven't used anything from them other than the models themselves. In the software I wrote using that format works well and I'm not sure why it would be changed, since it's still the same model.
Here's what I meant. Huggingface makes the * into italics so here's a screenshot.

Szarka

Jan 21, 2023

•

edited Jan 21, 2023

Thank you! One more question please. Is it possible to deal with the very short term memory of the model? It only remembers 2-3 lines for me. Is it a hardware thing or the model needs more training?

Delcos

Jan 28, 2023

•

edited Jan 28, 2023

I'm not sure what you mean by that, it depends more on the software you're using for inference on the model. Look up kobold AI, it's another software that lets you run this and other models with much better chat features and access to a bunch of settings. The defaults work best with this model but you can still play around with them. It also lets you give the model context up to 2056 tokens (it can go higher but don't because it WILL collapse) which is around 2,010 words. That let's it hold context for a few paragraphs, and it also supports residual memory so it can store core details in permanent memory.

Szarka

Jan 28, 2023

Yeah it was the GUI i used was bad... With koboldai and tavern ai the model can remember more things

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment