t-bank-ai
/

response-quality-classifier-tiny

Text Classification

Inference Endpoints

Model card Files Files and versions Community

egoriya commited on May 31, 2022

Commit

e81ff35

•

1 Parent(s): 566945e

Update README.md

Files changed (1) hide show

README.md +6 -4

README.md CHANGED Viewed

@@ -18,8 +18,11 @@ The labels explanation:
 The preferable length of the dialogue is 4 where the last message is needed to be estimated
 It is pretrained on corpus of dialog data and finetuned on [tinkoff-ai/context_similarity](https://huggingface.co/tinkoff-ai/context_similarity).
-The performance of the model on validation split [tinkoff-ai/context_similarity](https://huggingface.co/tinkoff-ai/context_similarity) (with the best thresholds for validation samples):
 |             |   threshold |   f0.5 |   ROC AUC |
@@ -37,10 +40,9 @@ import torch
 tokenizer = AutoTokenizer.from_pretrained("tinkoff-ai/response-quality-classifier-tiny")
 model = AutoModelForSequenceClassification.from_pretrained("tinkoff-ai/response-quality-classifier-tiny")
 # model.cuda()
-inputs = tokenizer('привет[SEP]привет![SEP]как дела?[RESPONSE_TOKEN]норм, у тя как?',
-                   padding=True, max_length=128, truncation=True, add_special_tokens=False, return_tensors='pt')
 with torch.inference_mode():
     logits = model(**inputs).logits
     probas = torch.sigmoid(logits)[0].cpu().detach().numpy()
-print(probas)
 ```

 The preferable length of the dialogue is 4 where the last message is needed to be estimated
+It is pretrained on a large corpus of dialog data in unsupervised manner: the model is trained to predict whether last response was in a real dialog, or it was pulled from some other dialog at random.
+Then it was finetuned on manually labelled examples (dataset will be posted soon).
 It is pretrained on corpus of dialog data and finetuned on [tinkoff-ai/context_similarity](https://huggingface.co/tinkoff-ai/context_similarity).
+The performance of the model on validation split (dataset will be posted soon)[tinkoff-ai/context_similarity](https://huggingface.co/tinkoff-ai/context_similarity) (with the best thresholds for validation samples):
 |             |   threshold |   f0.5 |   ROC AUC |
 tokenizer = AutoTokenizer.from_pretrained("tinkoff-ai/response-quality-classifier-tiny")
 model = AutoModelForSequenceClassification.from_pretrained("tinkoff-ai/response-quality-classifier-tiny")
 # model.cuda()
+inputs = tokenizer('[CLS]привет[SEP]привет![SEP]как дела?[RESPONSE_TOKEN]норм, у тя как?', max_length=128, add_special_tokens=False, return_tensors='pt')
 with torch.inference_mode():
     logits = model(**inputs).logits
     probas = torch.sigmoid(logits)[0].cpu().detach().numpy()
+relevance, specificity = probas
 ```