README.md · s-nlp/mdistilbert-base-formality-ranker at af5500de10ddb0666a355c998894634c4bba9b25

metadata

language:
  - en
  - fr
  - it
  - pt
tags:
  - formality
licenses:
  - cc-by-nc-sa

Model Overview

This is the model presented in the paper "Detecting Text Formality: A Study of Text Classification Approaches".

The original model is mDistilBERT (base). Then, it was fine-tuned on the multilingual corpus for fomality classiication X-FORMAL that consists of 4 languages -- English (from GYAFC), French, Italian, and Brazilian Portuguese. In our experiments, the model showed the best results within Transformer-based models for the cross-lingual formality classification knowledge transfer task. More details, code and data can be found here.

Evaluation Results

Here, we provide several metrics of the best models from each category participated in the comparison to understand the ranks of values. We report accuracy score for two setups -- multilingual model fine-tuned for each language separately and then fine-tuned on all languages. For cross-lingual experiments results, please, refer to the paper.

	En	It	Po	Fr	All
bag-of-words	79.1	71.3	70.6	72.5	---
CharBiLSTM	87.0	79.1	75.9	81.3	82.7
mDistilBERT-cased	86.6	76.8	75.9	79.1	79.4
mDeBERTa-base	87.3	76.6	75.8	78.9	79.9

How to use

from transformers import AutoModelForSequenceClassification, AutoTokenizer
model_name = 'mdistilbert-base-formality-ranker'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

Citation

TBD

Licensing Information

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.