Spaces:
Running
Running
toaster61
commited on
Commit
·
e3396ba
1
Parent(s):
559ea97
it works!
Browse files- Dockerfile +18 -17
- __pycache__/app.cpython-311.pyc +0 -0
- app.py +15 -6
- run-docker.sh +5 -0
- system.prompt +1 -0
Dockerfile
CHANGED
@@ -1,27 +1,28 @@
|
|
1 |
-
|
|
|
2 |
|
|
|
3 |
USER root
|
4 |
|
5 |
-
|
|
|
|
|
6 |
|
7 |
-
|
8 |
-
RUN
|
9 |
-
RUN
|
10 |
-
|
11 |
-
|
12 |
|
13 |
-
|
14 |
-
|
15 |
-
|
16 |
-
|
17 |
-
WORKDIR
|
18 |
|
|
|
19 |
RUN python3 -m pip install -U --no-cache-dir pip setuptools wheel
|
20 |
-
|
21 |
RUN pip install --no-cache-dir --upgrade -r /app/requirements.txt
|
22 |
|
23 |
-
|
24 |
-
RUN chown -R root:root /.cache/huggingface/hub
|
25 |
-
RUN chmod -R 777 /.cache/huggingface/hub
|
26 |
-
|
27 |
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
|
|
|
1 |
+
# Loading base. I'm using Debian, u can use whatever u want.
|
2 |
+
FROM python:3.11.5-slim-bookworm
|
3 |
|
4 |
+
# Just for sure everything will be fine.
|
5 |
USER root
|
6 |
|
7 |
+
# Installing gcc compiler and main library.
|
8 |
+
RUN apt update && apt install gcc cmake build-essential -y
|
9 |
+
RUN CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python==0.1.78
|
10 |
|
11 |
+
# Installing wget and downloading model.
|
12 |
+
RUN apt install wget -y
|
13 |
+
RUN wget -O model.bin https://huggingface.co/OpenBuddy/openbuddy-ggml/resolve/main/openbuddy-openllama-3b-v10-q5_0.bin
|
14 |
+
# You can use other models! Visit https://huggingface.co/OpenBuddy/openbuddy-ggml and choose model that u like!
|
15 |
+
# Or u can comment this two RUNs and include in Space/repo/Docker image own model with name "model.bin".
|
16 |
|
17 |
+
# Copying files into folder and making it working dir.
|
18 |
+
RUN mkdir app
|
19 |
+
COPY . /app
|
20 |
+
RUN chmod -R 777 /app
|
21 |
+
WORKDIR /app
|
22 |
|
23 |
+
# Updating pip and installing everything from requirements
|
24 |
RUN python3 -m pip install -U --no-cache-dir pip setuptools wheel
|
|
|
25 |
RUN pip install --no-cache-dir --upgrade -r /app/requirements.txt
|
26 |
|
27 |
+
# Now it's time to run Quart app using uvicorn! (It's faster, trust me.)
|
|
|
|
|
|
|
28 |
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
|
__pycache__/app.cpython-311.pyc
ADDED
Binary file (2.8 kB). View file
|
|
app.py
CHANGED
@@ -2,6 +2,7 @@ from quart import Quart, request
|
|
2 |
from llama_cpp import Llama
|
3 |
|
4 |
app = Quart(__name__)
|
|
|
5 |
|
6 |
with open('system.prompt', 'r', encoding='utf-8') as f:
|
7 |
prompt = f.read()
|
@@ -10,18 +11,26 @@ with open('system.prompt', 'r', encoding='utf-8') as f:
|
|
10 |
async def echo():
|
11 |
try:
|
12 |
data = await request.get_json()
|
13 |
-
|
14 |
userPrompt = prompt + "\n\nUser: " + data['request'] + "\nAssistant: "
|
15 |
except: return {"error": "Not enough data"}, 400
|
16 |
-
|
|
|
|
|
|
|
|
|
|
|
17 |
|
18 |
@app.get("/")
|
19 |
async def get():
|
20 |
-
return '''<
|
|
|
21 |
This is showcase how to make own server with OpenBuddy's model.<br>
|
22 |
I'm using here 3b model just for example. Also here's only CPU power.<br>
|
23 |
But you can use GPU power as well!<br>
|
24 |
-
<br>
|
25 |
<h1>How to GPU?</h1>
|
26 |
-
|
27 |
-
|
|
|
|
|
|
|
|
2 |
from llama_cpp import Llama
|
3 |
|
4 |
app = Quart(__name__)
|
5 |
+
llm = Llama(model_path="./model.bin")
|
6 |
|
7 |
with open('system.prompt', 'r', encoding='utf-8') as f:
|
8 |
prompt = f.read()
|
|
|
11 |
async def echo():
|
12 |
try:
|
13 |
data = await request.get_json()
|
14 |
+
maxTokens = data.get("max_tokens", 64)
|
15 |
userPrompt = prompt + "\n\nUser: " + data['request'] + "\nAssistant: "
|
16 |
except: return {"error": "Not enough data"}, 400
|
17 |
+
try:
|
18 |
+
output = llm(userPrompt, max_tokens=maxTokens, stop=["User:", "\n"], echo=False)
|
19 |
+
return {"output": output["choices"][0]["text"]}
|
20 |
+
except Exception as e:
|
21 |
+
print(e)
|
22 |
+
return {"error": "Server error"}, 500
|
23 |
|
24 |
@app.get("/")
|
25 |
async def get():
|
26 |
+
return '''<style>a:visited{color:black;}</style>
|
27 |
+
<h1>Hello, world!</h1>
|
28 |
This is showcase how to make own server with OpenBuddy's model.<br>
|
29 |
I'm using here 3b model just for example. Also here's only CPU power.<br>
|
30 |
But you can use GPU power as well!<br>
|
|
|
31 |
<h1>How to GPU?</h1>
|
32 |
+
Change <code>`CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS`</code> in Dockerfile on <code>`CMAKE_ARGS="-DLLAMA_CUBLAS=on"`</code>. Also you can try <code>`DLLAMA_CLBLAST`</code>, <code>`DLLAMA_METAL`</code> or <code>`DLLAMA_METAL`</code>.<br>
|
33 |
+
Powered by <a href="https://github.com/abetlen/llama-cpp-python">llama-cpp-python</a>, <a href="https://quart.palletsprojects.com/">Quart</a> and <a href="https://www.uvicorn.org/">Uvicorn</a>.<br>
|
34 |
+
<h1>How to test it on own machine?</h1>
|
35 |
+
You can install Docker, build image and run it. I made <code>`run-docker.sh`</code> for ya. To stop container run <code>`docker ps`</code>, find name of container and run <code>`docker stop _dockerContainerName_`</code><br>
|
36 |
+
Or you can once follow steps in Dockerfile and try it on your machine, not in Docker.<br>'''
|
run-docker.sh
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# This is SH file for running Dockerfile.
|
2 |
+
# Use it for tests. AND INSTALL DOCKER BEFORE U RUN IT!!!
|
3 |
+
|
4 |
+
docker build -t llama-server .
|
5 |
+
docker run -dp 0.0.0.0:7860:7860 llama-server
|
system.prompt
CHANGED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
Prompt: Отвечай максимально кратко и по делу.
|