limcheekin commited on
Commit
5e8aed0
1 Parent(s): 0aa1213

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ tags:
6
+ - ctranslate2
7
+ - RedPajama-INCITE-7B-Chat
8
+ - redpajama-chat-7b
9
+ - quantization
10
+ - int8
11
+ ---
12
+
13
+ # RedPajama-INCITE-7B-Chat Q8
14
+
15
+ The model is quantized version of the [togethercomputer/RedPajama-INCITE-7B-Chat](https://huggingface.co/togethercomputer/RedPajama-INCITE-7B-Chat) with int8 quantization.
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ The model being quantized using [CTranslate2](https://opennmt.net/CTranslate2/) with the following command:
22
+
23
+ ```
24
+ ct2-transformers-converter --model togethercomputer/redpajama-chat-7b --output_dir togethercomputer/redpajama-chat-7b-ct2 --copy_files tokenizer.json tokenizer_config.json special_tokens_map.json generation_config.json --quantization int8 --force --low_cpu_mem_usage
25
+ ```
26
+
27
+ If you want to perform the quantization yourself, you need to install the following dependencies:
28
+
29
+ ```
30
+ pip install -qU ctranslate2 transformers[torch] accelerate
31
+ ```
32
+
33
+ - **Shared by:** Lim Chee Kin
34
+ - **License:** Apache 2.0
35
+
36
+ ## How to Get Started with the Model
37
+
38
+ Use the code below to get started with the model.
39
+
40
+ ```python
41
+ import ctranslate2
42
+ import transformers
43
+
44
+ generator = ctranslate2.Generator("limcheekin/redpajama-chat-7b-ct2")
45
+ tokenizer = transformers.AutoTokenizer.from_pretrained("limcheekin/redpajama-chat-7b-ct2")
46
+
47
+ prompt = "<human>: Who is Alan Turing?\n<bot>:"
48
+ tokens = tokenizer.convert_ids_to_tokens(tokenizer.encode(prompt))
49
+
50
+ results = generator.generate_batch([tokens], max_length=256, sampling_topk=10)
51
+
52
+ text = tokenizer.decode(results[0].sequences_ids[0])
53
+ ```
54
+
55
+ The code is taken from https://opennmt.net/CTranslate2/guides/transformers.html#mpt.
56
+
57
+ The key method of the code above is `generate_batch`, you can find out [its supported parameters here](https://opennmt.net/CTranslate2/python/ctranslate2.Generator.html#ctranslate2.Generator.generate_batch).
config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<|endoftext|>",
3
+ "eos_token": "<|endoftext|>",
4
+ "layer_norm_epsilon": null,
5
+ "unk_token": "<|endoftext|>"
6
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 0,
4
+ "eos_token_id": 0,
5
+ "transformers_version": "4.29.1"
6
+ }
model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3893f998a30bbb1055f9642a95c5da5e4068f603ae29e2141ad01f4256733fdf
3
+ size 6867595176
special_tokens_map.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<|endoftext|>",
3
+ "eos_token": "<|endoftext|>",
4
+ "unk_token": "<|endoftext|>"
5
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "bos_token": "<|endoftext|>",
4
+ "clean_up_tokenization_spaces": true,
5
+ "eos_token": "<|endoftext|>",
6
+ "model_max_length": 2048,
7
+ "tokenizer_class": "GPTNeoXTokenizer",
8
+ "unk_token": "<|endoftext|>"
9
+ }
vocabulary.txt ADDED
The diff for this file is too large to render. See raw diff