UniNER-W4A16 / README.md
arynkiewicz's picture
Create README.md
ae553a7 verified
metadata
base_model: Universal-NER/UniNER-7B-all
tags:
  - named entity recognition
  - ner
model-index:
  - name: daisd-ai/UniNER-W4A16
    results: []
license: cc-by-nc-4.0
inference: false

Introduction

This model is quantized version of Universal-NER/UniNER-7B-all.

Quantization

The quantization was applied using LLM Compressor with 512 random examples from Universal-NER/Pile-NER-definition dataset.

The recipe for quantization:

recipe = [
    SmoothQuantModifier(smoothing_strength=0.8),
    GPTQModifier(targets="Linear", scheme="W4A16", ignore=["lm_head"]),
]

Inference

We added chat template for the tokenizer, thus it can be directly used with vLLM without any other preprocessing compered to original model.

Example:

import json

from vllm import LLM, SamplingParams

# Loading model
llm = LLM(model="daisd-ai/UniNER-W4A16")
sampling_params = SamplingParams(temperature=0, max_tokens=256)

# Define text and entities types
text = "Some long text with multiple entities"
entities_types = ["entity type 1", "entity type 2"]

# Applying tokenizer
prompts = []
for entity_type in entities_types:
    messages = [
        {
            "role": "user",
            "content": f"Text: {text}",
        },
        {"role": "assistant", "content": "I've read this text."},
        {"role": "user", "content":f"What describes {entity_type} in the text?"},
    ]
    prompt = self.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    prompts.append(prompt)

# Run inference
outputs = llm.generate(prompts, self.sampling_params)
outputs = [output.outputs[0].text for output in outputs]

# Results are returned is JSON format, parse it to python list
results = []
for lst in outputs:
    try:
        entities = list(set(json.loads(lst)))
    except Exception:
        entities = []

    results.append(entities)