|
--- |
|
tags: |
|
- Transformers |
|
- text-classification |
|
- intent-classification |
|
- multi-class-classification |
|
- natural-language-understanding |
|
languages: |
|
- af-ZA |
|
- am-ET |
|
- ar-SA |
|
- az-AZ |
|
- bn-BD |
|
- cy-GB |
|
- da-DK |
|
- de-DE |
|
- el-GR |
|
- en-US |
|
- es-ES |
|
- fa-IR |
|
- fi-FI |
|
- fr-FR |
|
- he-IL |
|
- hi-IN |
|
- hu-HU |
|
- hy-AM |
|
- id-ID |
|
- is-IS |
|
- it-IT |
|
- ja-JP |
|
- jv-ID |
|
- ka-GE |
|
- km-KH |
|
- kn-IN |
|
- ko-KR |
|
- lv-LV |
|
- ml-IN |
|
- mn-MN |
|
- ms-MY |
|
- my-MM |
|
- nb-NO |
|
- nl-NL |
|
- pl-PL |
|
- pt-PT |
|
- ro-RO |
|
- ru-RU |
|
- sl-SL |
|
- sq-AL |
|
- sv-SE |
|
- sw-KE |
|
- ta-IN |
|
- te-IN |
|
- th-TH |
|
- tl-PH |
|
- tr-TR |
|
- ur-PK |
|
- vi-VN |
|
- zh-CN |
|
- zh-TW |
|
multilinguality: |
|
- af-ZA |
|
- am-ET |
|
- ar-SA |
|
- az-AZ |
|
- bn-BD |
|
- cy-GB |
|
- da-DK |
|
- de-DE |
|
- el-GR |
|
- en-US |
|
- es-ES |
|
- fa-IR |
|
- fi-FI |
|
- fr-FR |
|
- he-IL |
|
- hi-IN |
|
- hu-HU |
|
- hy-AM |
|
- id-ID |
|
- is-IS |
|
- it-IT |
|
- ja-JP |
|
- jv-ID |
|
- ka-GE |
|
- km-KH |
|
- kn-IN |
|
- ko-KR |
|
- lv-LV |
|
- ml-IN |
|
- mn-MN |
|
- ms-MY |
|
- my-MM |
|
- nb-NO |
|
- nl-NL |
|
- pl-PL |
|
- pt-PT |
|
- ro-RO |
|
- ru-RU |
|
- sl-SL |
|
- sq-AL |
|
- sv-SE |
|
- sw-KE |
|
- ta-IN |
|
- te-IN |
|
- th-TH |
|
- tl-PH |
|
- tr-TR |
|
- ur-PK |
|
- vi-VN |
|
- zh-CN |
|
- zh-TW |
|
datasets: |
|
- qanastek/MASSIVE |
|
widget: |
|
- text: "wake me up at five am this week" |
|
- text: "je veux écouter la chanson de jacques brel encore une fois" |
|
- text: "quiero escuchar la canción de arijit singh una vez más" |
|
- text: "olly onde é que á um parque por perto onde eu possa correr" |
|
- text: "פרק הבא בפודקאסט בבקשה" |
|
- text: "亚马逊股价" |
|
- text: "найди билет на поезд в санкт-петербург" |
|
license: cc-by-4.0 |
|
--- |
|
|
|
**People Involved** |
|
|
|
* [LABRAK Yanis](https://www.linkedin.com/in/yanis-labrak-8a7412145/) (1) |
|
|
|
**Affiliations** |
|
|
|
1. [LIA, NLP team](https://lia.univ-avignon.fr/), Avignon University, Avignon, France. |
|
|
|
## Demo: How to use in HuggingFace Transformers Pipeline |
|
|
|
Requires [transformers](https://pypi.org/project/transformers/): ```pip install transformers``` |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline |
|
|
|
model_name = 'qanastek/XLMRoberta-Alexa-Intents-Classification' |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForSequenceClassification.from_pretrained(model_name) |
|
classifier = TextClassificationPipeline(model=model, tokenizer=tokenizer) |
|
|
|
res = classifier("réveille-moi à neuf heures du matin le vendredi") |
|
print(res) |
|
``` |
|
|
|
Outputs: |
|
|
|
```python |
|
[{'label': 'alarm_set', 'score': 0.9998375177383423}] |
|
``` |
|
|
|
## Training data |
|
|
|
[MASSIVE](https://huggingface.co/datasets/qanastek/MASSIVE) is a parallel dataset of > 1M utterances across 51 languages with annotations for the Natural Language Understanding tasks of intent prediction and slot annotation. Utterances span 60 intents and include 55 slot types. MASSIVE was created by localizing the SLURP dataset, composed of general Intelligent Voice Assistant single-shot interactions. |
|
|
|
## Intents |
|
|
|
* audio_volume_other |
|
* play_music |
|
* iot_hue_lighton |
|
* general_greet |
|
* calendar_set |
|
* audio_volume_down |
|
* social_query |
|
* audio_volume_mute |
|
* iot_wemo_on |
|
* iot_hue_lightup |
|
* audio_volume_up |
|
* iot_coffee |
|
* takeaway_query |
|
* qa_maths |
|
* play_game |
|
* cooking_query |
|
* iot_hue_lightdim |
|
* iot_wemo_off |
|
* music_settings |
|
* weather_query |
|
* news_query |
|
* alarm_remove |
|
* social_post |
|
* recommendation_events |
|
* transport_taxi |
|
* takeaway_order |
|
* music_query |
|
* calendar_query |
|
* lists_query |
|
* qa_currency |
|
* recommendation_movies |
|
* general_joke |
|
* recommendation_locations |
|
* email_querycontact |
|
* lists_remove |
|
* play_audiobook |
|
* email_addcontact |
|
* lists_createoradd |
|
* play_radio |
|
* qa_stock |
|
* alarm_query |
|
* email_sendemail |
|
* general_quirky |
|
* music_likeness |
|
* cooking_recipe |
|
* email_query |
|
* datetime_query |
|
* transport_traffic |
|
* play_podcasts |
|
* iot_hue_lightchange |
|
* calendar_remove |
|
* transport_query |
|
* transport_ticket |
|
* qa_factoid |
|
* iot_cleaning |
|
* alarm_set |
|
* datetime_convert |
|
* iot_hue_lightoff |
|
* qa_definition |
|
* music_dislikeness |
|
|
|
## Evaluation results |
|
|
|
```plain |
|
precision recall f1-score support |
|
|
|
alarm_query 0.9661 0.9037 0.9338 1734 |
|
alarm_remove 0.9484 0.9608 0.9545 1071 |
|
alarm_set 0.8611 0.9254 0.8921 2091 |
|
audio_volume_down 0.8657 0.9537 0.9075 561 |
|
audio_volume_mute 0.8608 0.9130 0.8861 1632 |
|
audio_volume_other 0.8684 0.5392 0.6653 306 |
|
audio_volume_up 0.7198 0.8446 0.7772 663 |
|
calendar_query 0.7555 0.8229 0.7878 6426 |
|
calendar_remove 0.8688 0.9441 0.9049 3417 |
|
calendar_set 0.9092 0.9014 0.9053 10659 |
|
cooking_query 0.0000 0.0000 0.0000 0 |
|
cooking_recipe 0.9282 0.8592 0.8924 3672 |
|
datetime_convert 0.8144 0.7686 0.7909 765 |
|
datetime_query 0.9152 0.9305 0.9228 4488 |
|
email_addcontact 0.6482 0.8431 0.7330 612 |
|
email_query 0.9629 0.9319 0.9472 6069 |
|
email_querycontact 0.6853 0.8032 0.7396 1326 |
|
email_sendemail 0.9530 0.9381 0.9455 5814 |
|
general_greet 0.1026 0.3922 0.1626 51 |
|
general_joke 0.9305 0.9123 0.9213 969 |
|
general_quirky 0.6984 0.5417 0.6102 8619 |
|
iot_cleaning 0.9590 0.9359 0.9473 1326 |
|
iot_coffee 0.9304 0.9749 0.9521 1836 |
|
iot_hue_lightchange 0.8794 0.9374 0.9075 1836 |
|
iot_hue_lightdim 0.8695 0.8711 0.8703 1071 |
|
iot_hue_lightoff 0.9440 0.9229 0.9334 2193 |
|
iot_hue_lighton 0.4545 0.5882 0.5128 153 |
|
iot_hue_lightup 0.9271 0.8315 0.8767 1377 |
|
iot_wemo_off 0.9615 0.8715 0.9143 918 |
|
iot_wemo_on 0.8455 0.7941 0.8190 510 |
|
lists_createoradd 0.8437 0.8356 0.8396 1989 |
|
lists_query 0.8918 0.8335 0.8617 2601 |
|
lists_remove 0.9536 0.8601 0.9044 2652 |
|
music_dislikeness 0.7725 0.7157 0.7430 204 |
|
music_likeness 0.8570 0.8159 0.8359 1836 |
|
music_query 0.8667 0.8050 0.8347 1785 |
|
music_settings 0.4024 0.3301 0.3627 306 |
|
news_query 0.8343 0.8657 0.8498 6324 |
|
play_audiobook 0.8172 0.8125 0.8149 2091 |
|
play_game 0.8666 0.8403 0.8532 1785 |
|
play_music 0.8683 0.8845 0.8763 8976 |
|
play_podcasts 0.8925 0.9125 0.9024 3213 |
|
play_radio 0.8260 0.8935 0.8585 3672 |
|
qa_currency 0.9459 0.9578 0.9518 1989 |
|
qa_definition 0.8638 0.8552 0.8595 2907 |
|
qa_factoid 0.7959 0.8178 0.8067 7191 |
|
qa_maths 0.8937 0.9302 0.9116 1275 |
|
qa_stock 0.7995 0.9412 0.8646 1326 |
|
recommendation_events 0.7646 0.7702 0.7674 2193 |
|
recommendation_locations 0.7489 0.8830 0.8104 1581 |
|
recommendation_movies 0.6907 0.7706 0.7285 1020 |
|
social_post 0.9623 0.9080 0.9344 4131 |
|
social_query 0.8104 0.7914 0.8008 1275 |
|
takeaway_order 0.7697 0.8458 0.8059 1122 |
|
takeaway_query 0.9059 0.8571 0.8808 1785 |
|
transport_query 0.8141 0.7559 0.7839 2601 |
|
transport_taxi 0.9222 0.9403 0.9312 1173 |
|
transport_ticket 0.9259 0.9384 0.9321 1785 |
|
transport_traffic 0.6919 0.9660 0.8063 765 |
|
weather_query 0.9387 0.9492 0.9439 7956 |
|
|
|
accuracy 0.8617 151674 |
|
macro avg 0.8162 0.8273 0.8178 151674 |
|
weighted avg 0.8639 0.8617 0.8613 151674 |
|
``` |
|
|