# 使用VLLM ## 1. 首先启动 VLLM,自行选择模型 ``` python -m vllm.entrypoints.openai.api_server --model /home/hmp/llm/cache/Qwen1___5-32B-Chat --tensor-parallel-size 2 --dtype=half ``` 这里使用了存储在 `/home/hmp/llm/cache/Qwen1___5-32B-Chat` 的本地模型,可以根据自己的需求更改。 ## 2. 测试 VLLM ``` curl http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "/home/hmp/llm/cache/Qwen1___5-32B-Chat", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "怎么实现一个去中心化的控制器?"} ] }' ``` ## 3. 配置本项目 ``` API_KEY = "sk-123456789xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx123456789" LLM_MODEL = "vllm-/home/hmp/llm/cache/Qwen1___5-32B-Chat(max_token=4096)" API_URL_REDIRECT = {"https://api.openai.com/v1/chat/completions": "http://localhost:8000/v1/chat/completions"} ``` ``` "vllm-/home/hmp/llm/cache/Qwen1___5-32B-Chat(max_token=4096)" 其中 "vllm-" 是前缀(必要) "/home/hmp/llm/cache/Qwen1___5-32B-Chat" 是模型名(必要) "(max_token=6666)" 是配置(非必要) ``` ## 4. 启动! ``` python main.py ```