dardem commited on
Commit
9670a72
1 Parent(s): 423d9ea

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -2
README.md CHANGED
@@ -1,8 +1,46 @@
 
 
 
 
 
 
1
  **How to use**
2
  ```python
3
  from transformers import BartForConditionalGeneration, AutoTokenizer
4
  base_model_name = 'facebook/bart-base'
5
- model_name = 'SkolkovoInstitute/bart-base-detox-10000-7'
6
  tokenizer = AutoTokenizer.from_pretrained(base_model_name)
7
  model = BartForConditionalGeneration.from_pretrained(model_name)
8
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ **Model Overview**
2
+
3
+ This is the model presented in the paper ["ParaDetox: Detoxification with Parallel Data"](https://aclanthology.org/2022.acl-long.469/).
4
+
5
+ The model itself is [BART (base)](https://huggingface.co/facebook/bart-base) model trained on parallel detoxification dataset ParaDetox achiving SOTA results for detoxification task. More details, code and data can be found [here](https://github.com/skoltech-nlp/paradetox).
6
+
7
  **How to use**
8
  ```python
9
  from transformers import BartForConditionalGeneration, AutoTokenizer
10
  base_model_name = 'facebook/bart-base'
11
+ model_name = 'SkolkovoInstitute/bart-detox'
12
  tokenizer = AutoTokenizer.from_pretrained(base_model_name)
13
  model = BartForConditionalGeneration.from_pretrained(model_name)
14
+ ```
15
+
16
+ **Citation**
17
+ ```
18
+ @inproceedings{logacheva-etal-2022-paradetox,
19
+ title = "{P}ara{D}etox: Detoxification with Parallel Data",
20
+ author = "Logacheva, Varvara and
21
+ Dementieva, Daryna and
22
+ Ustyantsev, Sergey and
23
+ Moskovskiy, Daniil and
24
+ Dale, David and
25
+ Krotova, Irina and
26
+ Semenov, Nikita and
27
+ Panchenko, Alexander",
28
+ booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
29
+ month = may,
30
+ year = "2022",
31
+ address = "Dublin, Ireland",
32
+ publisher = "Association for Computational Linguistics",
33
+ url = "https://aclanthology.org/2022.acl-long.469",
34
+ pages = "6804--6818",
35
+ abstract = "We present a novel pipeline for the collection of parallel data for the detoxification task. We collect non-toxic paraphrases for over 10,000 English toxic sentences. We also show that this pipeline can be used to distill a large existing corpus of paraphrases to get toxic-neutral sentence pairs. We release two parallel corpora which can be used for the training of detoxification models. To the best of our knowledge, these are the first parallel datasets for this task.We describe our pipeline in detail to make it fast to set up for a new language or domain, thus contributing to faster and easier development of new parallel resources.We train several detoxification models on the collected data and compare them with several baselines and state-of-the-art unsupervised approaches. We conduct both automatic and manual evaluations. All models trained on parallel data outperform the state-of-the-art unsupervised models by a large margin. This suggests that our novel datasets can boost the performance of detoxification systems.",
36
+ }
37
+ ```
38
+
39
+ ## Licensing Information
40
+
41
+ [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa].
42
+
43
+ [![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa]
44
+
45
+ [cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/
46
+ [cc-by-nc-sa-image]: https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png