--- language: - en - zh - ms - ta datasets: - billcai/ospc-dataset-v2 tags: - multilingual - mistral - sft - chat - instruction - gptq widget: - text: "Hello World" example_title: "Sample prompt" license: apache-2.0 base_model: openchat/openchat-3.5-0106 --- This model achieved 0.7600 AUROC for the [Online Safety Prize Challenge](https://ospc.aisingapore.org/) organized by AI Singapore. It is fine-tuned from [Openchat model](https://huggingface.co/openchat/openchat-3.5-0106). Note, this model is already quantized to 4-bit GPTQ. ## Training details This model is trained on a synthetic dataset generated by Apache 2.0 licensed LLMs (Mistral and Mixtral). ## How to use You can either use it to classify text or text + extracted visual description. ```python from transformers import AutoModelForCausalLM, AutoTokenizer device = "cuda" model = AutoModelForCausalLM.from_pretrained( 'billcai/openchat-model-ospc', torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained('billcai/openchat-model-ospc') promptText = "" # promptText = "Image: , Caption: " prompt = """Check if there is unsafe content in the following message. Unsafe content is defined in the following categories. Contains hate speech and harmful content. Harmful content centres on divisive and contentious subject matters. Typical areas include: - Racial disparities - Religious beliefs and practices - Sexual orientation - Nationalistic sentiments and xenophobia against migrants - Socio-economic divides and class struggles - Age-related biases - Gender discrimination - Discrimination based on illnesses and disabilities These content may be based on Singapore context, including the use of non-English languages like Mandarin, Malay and Tamil. This is the message you are checking if it contains unsafe content. {} Provide your safety assessment for the message above. Your first line MUST read 'safe' or 'unsafe'. """.format( promptText ) messages = [ {"role": "system", "content": "You are a helpful assistant deciding whether a piece of content is safe or unsafe based on specific criteria. The content may contain irrelevant words as it has been extracted from an image."}, {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to('cuda') generation_output = llm.generate( model_inputs.input_ids, max_new_tokens=10, temperature=0.1, output_logits=True, return_dict_in_generate=True ) generated_sequences = generation_output['sequences'] generated_logits = generation_output['logits'] unsafeTokenId = tokenizer.encode('unsafe')[1] safeTokenId = tokenizer.encode('safe')[1] firstLogit = generated_logits[0].cpu().numpy() prob = softmax([ firstLogit[0,unsafeTokenId], firstLogit[0,safeTokenId], ]) print(prob) # first is score for unsafe token. ``` # License Apache 2.0