egoriya commited on
Commit
e81ff35
1 Parent(s): 566945e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -4
README.md CHANGED
@@ -18,8 +18,11 @@ The labels explanation:
18
 
19
  The preferable length of the dialogue is 4 where the last message is needed to be estimated
20
 
 
 
 
21
  It is pretrained on corpus of dialog data and finetuned on [tinkoff-ai/context_similarity](https://huggingface.co/tinkoff-ai/context_similarity).
22
- The performance of the model on validation split [tinkoff-ai/context_similarity](https://huggingface.co/tinkoff-ai/context_similarity) (with the best thresholds for validation samples):
23
 
24
 
25
  | | threshold | f0.5 | ROC AUC |
@@ -37,10 +40,9 @@ import torch
37
  tokenizer = AutoTokenizer.from_pretrained("tinkoff-ai/response-quality-classifier-tiny")
38
  model = AutoModelForSequenceClassification.from_pretrained("tinkoff-ai/response-quality-classifier-tiny")
39
  # model.cuda()
40
- inputs = tokenizer('привет[SEP]привет![SEP]как дела?[RESPONSE_TOKEN]норм, у тя как?',
41
- padding=True, max_length=128, truncation=True, add_special_tokens=False, return_tensors='pt')
42
  with torch.inference_mode():
43
  logits = model(**inputs).logits
44
  probas = torch.sigmoid(logits)[0].cpu().detach().numpy()
45
- print(probas)
46
  ```
 
18
 
19
  The preferable length of the dialogue is 4 where the last message is needed to be estimated
20
 
21
+ It is pretrained on a large corpus of dialog data in unsupervised manner: the model is trained to predict whether last response was in a real dialog, or it was pulled from some other dialog at random.
22
+
23
+ Then it was finetuned on manually labelled examples (dataset will be posted soon).
24
  It is pretrained on corpus of dialog data and finetuned on [tinkoff-ai/context_similarity](https://huggingface.co/tinkoff-ai/context_similarity).
25
+ The performance of the model on validation split (dataset will be posted soon)[tinkoff-ai/context_similarity](https://huggingface.co/tinkoff-ai/context_similarity) (with the best thresholds for validation samples):
26
 
27
 
28
  | | threshold | f0.5 | ROC AUC |
 
40
  tokenizer = AutoTokenizer.from_pretrained("tinkoff-ai/response-quality-classifier-tiny")
41
  model = AutoModelForSequenceClassification.from_pretrained("tinkoff-ai/response-quality-classifier-tiny")
42
  # model.cuda()
43
+ inputs = tokenizer('[CLS]привет[SEP]привет![SEP]как дела?[RESPONSE_TOKEN]норм, у тя как?', max_length=128, add_special_tokens=False, return_tensors='pt')
 
44
  with torch.inference_mode():
45
  logits = model(**inputs).logits
46
  probas = torch.sigmoid(logits)[0].cpu().detach().numpy()
47
+ relevance, specificity = probas
48
  ```