t-bank-ai
/

response-quality-classifier-tiny

@@ -17,10 +17,9 @@ The labels explanation:
 - `specificity`: is the last message in the dialogue interesting and promotes the continuation of the dialogue.
 It is pretrained on a large corpus of dialog data in unsupervised manner: the model is trained to predict whether last response was in a real dialog, or it was pulled from some other dialog at random.
 Then it was finetuned on manually labelled examples (dataset will be posted soon).
-The model was trained with the dialogue length 4 where the last message is needed to be estimated. Each message in the dialogue was tokenized separately with ```  max_length = max_seq_length // 4 ```.
 The performance of the model on validation split (dataset will be posted soon) (with the best thresholds for validation samples):
@@ -34,12 +33,12 @@ The performance of the model on validation split (dataset will be posted soon) (
 How to use:
 ```python
-# pip install transformers
 from transformers import AutoTokenizer, AutoModelForSequenceClassification
 import torch
-tokenizer = AutoTokenizer.from_pretrained("tinkoff-ai/response-quality-classifier-tiny")
-model = AutoModelForSequenceClassification.from_pretrained("tinkoff-ai/response-quality-classifier-tiny")
-# model.cuda()
 inputs = tokenizer('[CLS]привет[SEP]привет![SEP]как дела?[RESPONSE_TOKEN]норм, у тя как?', max_length=128, add_special_tokens=False, return_tensors='pt')
 with torch.inference_mode():
     logits = model(**inputs).logits
@@ -47,6 +46,6 @@ with torch.inference_mode():
 relevance, specificity = probas
 ```
-The [app](https://huggingface.co/spaces/tinkoff-ai/response-quality-classifiers) where you can easily evaluate this model.
-The work was done during internship at Tinkoff by [egoriyaa](https://github.com/egoriyaa).

 - `specificity`: is the last message in the dialogue interesting and promotes the continuation of the dialogue.
 It is pretrained on a large corpus of dialog data in unsupervised manner: the model is trained to predict whether last response was in a real dialog, or it was pulled from some other dialog at random.
 Then it was finetuned on manually labelled examples (dataset will be posted soon).
+The model was trained with three messages in the context and one response. Each message was tokenized separately with ```  max_length = 32 ```.
 The performance of the model on validation split (dataset will be posted soon) (with the best thresholds for validation samples):
 How to use:
 ```python
+pip install transformers
 from transformers import AutoTokenizer, AutoModelForSequenceClassification
 import torch
+tokenizer = AutoTokenizer.from_pretrained('tinkoff-ai/response-quality-classifier-tiny')
+model = AutoModelForSequenceClassification.from_pretrained('tinkoff-ai/response-quality-classifier-tiny')
+model.cuda()
 inputs = tokenizer('[CLS]привет[SEP]привет![SEP]как дела?[RESPONSE_TOKEN]норм, у тя как?', max_length=128, add_special_tokens=False, return_tensors='pt')
 with torch.inference_mode():
     logits = model(**inputs).logits
 relevance, specificity = probas
 ```
+The [app](https://huggingface.co/spaces/tinkoff-ai/response-quality-classifiers) where you can easily interact with this model.
+The work was done during internship at Tinkoff by [egoriyaa](https://github.com/egoriyaa), mentored by [solemn-leader](https://huggingface.co/solemn-leader).