|
--- |
|
language: |
|
- en |
|
datasets: |
|
- natural_instructions |
|
- the_pile |
|
- cot |
|
- Muennighoff/P3 |
|
tags: |
|
- ctranslate2 |
|
- int8 |
|
- float16 |
|
- gpt |
|
pipeline_tag: text-generation |
|
inference: |
|
parameters: |
|
temperature: 0.1 |
|
widget: |
|
- text: "Is this review positive or negative? Review: Best cast iron skillet you will ever buy. Answer:" |
|
example_title: "Sentiment analysis" |
|
- text: "Where is Zurich? Ans:" |
|
example_title: "Question Answering" |
|
--- |
|
# # Fast-Inference with Ctranslate2 |
|
Speedup inference while reducing memory by 2x-4x using int8 inference in C++ on CPU or GPU. |
|
|
|
quantized version of [togethercomputer/GPT-JT-6B-v0](https://huggingface.co/togethercomputer/GPT-JT-6B-v0) |
|
```bash |
|
pip install hf-hub-ctranslate2>=2.0.6 |
|
``` |
|
Converted on 2023-05-19 using |
|
``` |
|
ct2-transformers-converter --model togethercomputer/GPT-JT-6B-v0 --output_dir /home/michael/tmp-ct2fast-GPT-JT-6B-v0 --force --copy_files merges.txt tokenizer.json README.md tokenizer_config.json vocab.json special_tokens_map.json added_tokens.json .gitattributes --quantization float16 |
|
``` |
|
|
|
Checkpoint compatible to [ctranslate2>=3.13.0](https://github.com/OpenNMT/CTranslate2) and [hf-hub-ctranslate2>=2.0.6](https://github.com/michaelfeil/hf-hub-ctranslate2) |
|
- `compute_type=int8_float16` for `device="cuda"` |
|
- `compute_type=int8` for `device="cpu"` |
|
|
|
```python |
|
from hf_hub_ctranslate2 import TranslatorCT2fromHfHub, GeneratorCT2fromHfHub |
|
from transformers import AutoTokenizer |
|
|
|
model_name = "michaelfeil/ct2fast-GPT-JT-6B-v0" |
|
# use either TranslatorCT2fromHfHub or GeneratorCT2fromHfHub here, depending on model. |
|
model = GeneratorCT2fromHfHub( |
|
# load in int8 on CUDA |
|
model_name_or_path=model_name, |
|
device="cuda", |
|
compute_type="int8_float16", |
|
tokenizer=AutoTokenizer.from_pretrained("togethercomputer/GPT-JT-6B-v0") |
|
) |
|
outputs = model.generate( |
|
text=["How do you call a fast Flan-ingo?", "User: How are you doing? Bot:"], |
|
) |
|
print(outputs) |
|
``` |
|
|
|
# Licence and other remarks: |
|
This is just a quantized version. Licence conditions are intended to be idential to original huggingface repo. |
|
|
|
# Original description |
|
|
|
|
|
# Quick Start |
|
|
|
```python |
|
from transformers import pipeline |
|
|
|
pipe = pipeline(model='togethercomputer/GPT-JT-6B-v0') |
|
|
|
pipe("Where is Zurich? Ans:") |
|
``` |