SebastianSchramm
/

UniNER-7B-type-sup-GPTQ-4bit-128g-actorder_True

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions Community

SebastianSchramm commited on Aug 27, 2023

Commit

812b38b

•

1 Parent(s): 645aa5a

Create README.md

Files changed (1) hide show

README.md +50 -0

README.md ADDED Viewed

	@@ -0,0 +1,50 @@

+---
+language:
+- en
+pipeline_tag: text-generation
+library_name: transformers
+tags:
+- LLM
+- Universal-NER
+- NER
+- 4bit
+inference: false
+---
+![image](qunatized_lama_color_letters_4bit_512px.png)
+# Quantized version of Universal-NER/UniNER-7B-type-sub
+[Universal-NER/UniNER-7B-type-sub](https://huggingface.co/Universal-NER/UniNER-7B-type-sup) quantized to 4bit with GPTQ and stored with 1GB shard size.
+## Model Description
+The model [Universal-NER/UniNER-7B-type-sub](https://huggingface.co/Universal-NER/UniNER-7B-type-sup) was quantized to 4bit, group_size 128, and act-order=True with auto-gptq integration in transformers (https://huggingface.co/blog/gptq-integration).
+## Evaluation
+TODO
+## Prompt template
+Prompt template is the same as for the full precision model:
+```python
+prompt_template = """A virtual assistant answers questions from a user based on the provided text.
+USER: Text: {input_text}
+ASSISTANT: I’ve read this text.
+USER: What describes {entity_name} in the text?
+ASSISTANT:
+"""
+```
+## Usage
+It is recommended to format input according to the prompt template mentioned above during inference for best results.
+```python
+prompt = prompt_template.format_map({"input_text": "Cologne is a great city in Germany - maybe even the greatest ;)", "entity_name": "city"})
+```
+The model is small enough to be loaded in free-tier Colab with a T4 GPU:
+## License
+The original full precision model and its associated data are released under the CC BY-NC 4.0 license. Hence, the same license applies for the 4bit version.