unsubscribe
commited on
Commit
•
aaf74ae
1
Parent(s):
8261bfe
check in initial version of README
Browse files
README.md
CHANGED
@@ -1,3 +1,61 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
pipeline_tag: text-generation
|
6 |
+
tags:
|
7 |
+
- chat
|
8 |
+
---
|
9 |
+
# InternLM2.5-7B-Chat GGUF Model
|
10 |
+
|
11 |
+
## Introduction
|
12 |
+
|
13 |
+
The `internlm2_5-7b-chat` model in GGUF format can be utilized by [llama.cpp](https://github.com/ggerganov/llama.cpp), a highly popular open-source framework for Large Language Model (LLM) inference, across a variety of hardware platforms, both locally and in the cloud.
|
14 |
+
This repository offers `internlm2_5-7b-chat` models in GGUF format in both half precision and various low-bit quantized versions, including `q5_0`, `q5_k_m`, `q6_k`, and `q8_0`.
|
15 |
+
|
16 |
+
In the subsequent sections, we will first present the installation procedure, followed by an explanation of the model download process.
|
17 |
+
And finally we will illustrate the methods for model inference and service deployment through specific examples.
|
18 |
+
|
19 |
+
## Installation
|
20 |
+
|
21 |
+
We recommend building `llama.cpp` from source. The following code snippet provides an example for the Linux CUDA platform. For instructions on other platforms, please refer to the [official guide](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#build).
|
22 |
+
|
23 |
+
- Step 1: create a conda environment and install cmake
|
24 |
+
|
25 |
+
```shell
|
26 |
+
conda create --name internlm2 python=3.10 -y
|
27 |
+
conda activate internlm2
|
28 |
+
pip install cmake
|
29 |
+
```
|
30 |
+
|
31 |
+
- Step 2: clone the source code and build the project
|
32 |
+
|
33 |
+
```shell
|
34 |
+
git clone --depth=1 https://github.com/ggerganov/llama.cpp.git
|
35 |
+
cd llama.cpp
|
36 |
+
cmake -B build -DGGML_CUDA=ON
|
37 |
+
cmake --build build --config Release -j
|
38 |
+
```
|
39 |
+
|
40 |
+
All the built targets can be found in the sub directory `build/bin`
|
41 |
+
|
42 |
+
In the following sections, we assume that the working directory is at the root directory of `llama.cpp`.
|
43 |
+
|
44 |
+
## Download models
|
45 |
+
|
46 |
+
In the [introduction section](#introduction), we mentioned that this repository includes several models with varying levels of computational precision. You can download the appropriate model based on your requirements.
|
47 |
+
For instance, `internlm2_5-7b-chat-fp16.gguf` can be downloaded as below:
|
48 |
+
|
49 |
+
```shell
|
50 |
+
pip install huggingface-hub
|
51 |
+
huggingface-cli download internlm/internlm2_5-7b-chat-gguf internlm2_5-7b-chat-fp16.gguf --local-dir . --local-dir-use-symlinks False
|
52 |
+
```
|
53 |
+
|
54 |
+
## Inference
|
55 |
+
|
56 |
+
You can use `llama-cli` for conducting inference. For a detailed explanation of `llama-cli`, please refer to [this guide](https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md)
|
57 |
+
```shell
|
58 |
+
build/bin/llama-cli -m internlm2_5-7b-chat-fp16.gguf
|
59 |
+
```
|
60 |
+
|
61 |
+
## Serving
|