Update README.md
Browse files
README.md
CHANGED
@@ -18,8 +18,11 @@ The labels explanation:
|
|
18 |
|
19 |
The preferable length of the dialogue is 4 where the last message is needed to be estimated
|
20 |
|
|
|
|
|
|
|
21 |
It is pretrained on corpus of dialog data and finetuned on [tinkoff-ai/context_similarity](https://huggingface.co/tinkoff-ai/context_similarity).
|
22 |
-
The performance of the model on validation split [tinkoff-ai/context_similarity](https://huggingface.co/tinkoff-ai/context_similarity) (with the best thresholds for validation samples):
|
23 |
|
24 |
|
25 |
| | threshold | f0.5 | ROC AUC |
|
@@ -37,10 +40,9 @@ import torch
|
|
37 |
tokenizer = AutoTokenizer.from_pretrained("tinkoff-ai/response-quality-classifier-tiny")
|
38 |
model = AutoModelForSequenceClassification.from_pretrained("tinkoff-ai/response-quality-classifier-tiny")
|
39 |
# model.cuda()
|
40 |
-
inputs = tokenizer('привет[SEP]привет![SEP]как дела?[RESPONSE_TOKEN]норм, у тя как?',
|
41 |
-
padding=True, max_length=128, truncation=True, add_special_tokens=False, return_tensors='pt')
|
42 |
with torch.inference_mode():
|
43 |
logits = model(**inputs).logits
|
44 |
probas = torch.sigmoid(logits)[0].cpu().detach().numpy()
|
45 |
-
|
46 |
```
|
|
|
18 |
|
19 |
The preferable length of the dialogue is 4 where the last message is needed to be estimated
|
20 |
|
21 |
+
It is pretrained on a large corpus of dialog data in unsupervised manner: the model is trained to predict whether last response was in a real dialog, or it was pulled from some other dialog at random.
|
22 |
+
|
23 |
+
Then it was finetuned on manually labelled examples (dataset will be posted soon).
|
24 |
It is pretrained on corpus of dialog data and finetuned on [tinkoff-ai/context_similarity](https://huggingface.co/tinkoff-ai/context_similarity).
|
25 |
+
The performance of the model on validation split (dataset will be posted soon)[tinkoff-ai/context_similarity](https://huggingface.co/tinkoff-ai/context_similarity) (with the best thresholds for validation samples):
|
26 |
|
27 |
|
28 |
| | threshold | f0.5 | ROC AUC |
|
|
|
40 |
tokenizer = AutoTokenizer.from_pretrained("tinkoff-ai/response-quality-classifier-tiny")
|
41 |
model = AutoModelForSequenceClassification.from_pretrained("tinkoff-ai/response-quality-classifier-tiny")
|
42 |
# model.cuda()
|
43 |
+
inputs = tokenizer('[CLS]привет[SEP]привет![SEP]как дела?[RESPONSE_TOKEN]норм, у тя как?', max_length=128, add_special_tokens=False, return_tensors='pt')
|
|
|
44 |
with torch.inference_mode():
|
45 |
logits = model(**inputs).logits
|
46 |
probas = torch.sigmoid(logits)[0].cpu().detach().numpy()
|
47 |
+
relevance, specificity = probas
|
48 |
```
|