metadata
tags:
- Transformers
- text-classification
- intent-classification
- multi-class-classification
- natural-language-understanding
languages:
- af-ZA
- am-ET
- ar-SA
- az-AZ
- bn-BD
- cy-GB
- da-DK
- de-DE
- el-GR
- en-US
- es-ES
- fa-IR
- fi-FI
- fr-FR
- he-IL
- hi-IN
- hu-HU
- hy-AM
- id-ID
- is-IS
- it-IT
- ja-JP
- jv-ID
- ka-GE
- km-KH
- kn-IN
- ko-KR
- lv-LV
- ml-IN
- mn-MN
- ms-MY
- my-MM
- nb-NO
- nl-NL
- pl-PL
- pt-PT
- ro-RO
- ru-RU
- sl-SL
- sq-AL
- sv-SE
- sw-KE
- ta-IN
- te-IN
- th-TH
- tl-PH
- tr-TR
- ur-PK
- vi-VN
- zh-CN
- zh-TW
multilinguality:
- af-ZA
- am-ET
- ar-SA
- az-AZ
- bn-BD
- cy-GB
- da-DK
- de-DE
- el-GR
- en-US
- es-ES
- fa-IR
- fi-FI
- fr-FR
- he-IL
- hi-IN
- hu-HU
- hy-AM
- id-ID
- is-IS
- it-IT
- ja-JP
- jv-ID
- ka-GE
- km-KH
- kn-IN
- ko-KR
- lv-LV
- ml-IN
- mn-MN
- ms-MY
- my-MM
- nb-NO
- nl-NL
- pl-PL
- pt-PT
- ro-RO
- ru-RU
- sl-SL
- sq-AL
- sv-SE
- sw-KE
- ta-IN
- te-IN
- th-TH
- tl-PH
- tr-TR
- ur-PK
- vi-VN
- zh-CN
- zh-TW
datasets:
- qanastek/MASSIVE
widget:
- text: wake me up at five am this week
- text: je veux écouter la chanson de jacques brel encore une fois
- text: quiero escuchar la canción de arijit singh una vez más
- text: olly onde é que á um parque por perto onde eu possa correr
- text: פרק הבא בפודקאסט בבקשה
- text: 亚马逊股价
- text: найди билет на поезд в санкт-петербург
license: cc-by-4.0
People Involved
- LABRAK Yanis (1)
Affiliations
- LIA, NLP team, Avignon University, Avignon, France.
Demo: How to use in HuggingFace Transformers Pipeline
Requires transformers: pip install transformers
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline
model_name = 'qanastek/XLMRoberta-Alexa-Intents-Classification'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
classifier = TextClassificationPipeline(model=model, tokenizer=tokenizer)
res = classifier("réveille-moi à neuf heures du matin le vendredi")
print(res)
Outputs:
[{'label': 'alarm_set', 'score': 0.9998375177383423}]
Training data
MASSIVE is a parallel dataset of > 1M utterances across 51 languages with annotations for the Natural Language Understanding tasks of intent prediction and slot annotation. Utterances span 60 intents and include 55 slot types. MASSIVE was created by localizing the SLURP dataset, composed of general Intelligent Voice Assistant single-shot interactions.
Intents
- audio_volume_other
- play_music
- iot_hue_lighton
- general_greet
- calendar_set
- audio_volume_down
- social_query
- audio_volume_mute
- iot_wemo_on
- iot_hue_lightup
- audio_volume_up
- iot_coffee
- takeaway_query
- qa_maths
- play_game
- cooking_query
- iot_hue_lightdim
- iot_wemo_off
- music_settings
- weather_query
- news_query
- alarm_remove
- social_post
- recommendation_events
- transport_taxi
- takeaway_order
- music_query
- calendar_query
- lists_query
- qa_currency
- recommendation_movies
- general_joke
- recommendation_locations
- email_querycontact
- lists_remove
- play_audiobook
- email_addcontact
- lists_createoradd
- play_radio
- qa_stock
- alarm_query
- email_sendemail
- general_quirky
- music_likeness
- cooking_recipe
- email_query
- datetime_query
- transport_traffic
- play_podcasts
- iot_hue_lightchange
- calendar_remove
- transport_query
- transport_ticket
- qa_factoid
- iot_cleaning
- alarm_set
- datetime_convert
- iot_hue_lightoff
- qa_definition
- music_dislikeness
Evaluation results
precision recall f1-score support
alarm_query 0.9661 0.9037 0.9338 1734
alarm_remove 0.9484 0.9608 0.9545 1071
alarm_set 0.8611 0.9254 0.8921 2091
audio_volume_down 0.8657 0.9537 0.9075 561
audio_volume_mute 0.8608 0.9130 0.8861 1632
audio_volume_other 0.8684 0.5392 0.6653 306
audio_volume_up 0.7198 0.8446 0.7772 663
calendar_query 0.7555 0.8229 0.7878 6426
calendar_remove 0.8688 0.9441 0.9049 3417
calendar_set 0.9092 0.9014 0.9053 10659
cooking_query 0.0000 0.0000 0.0000 0
cooking_recipe 0.9282 0.8592 0.8924 3672
datetime_convert 0.8144 0.7686 0.7909 765
datetime_query 0.9152 0.9305 0.9228 4488
email_addcontact 0.6482 0.8431 0.7330 612
email_query 0.9629 0.9319 0.9472 6069
email_querycontact 0.6853 0.8032 0.7396 1326
email_sendemail 0.9530 0.9381 0.9455 5814
general_greet 0.1026 0.3922 0.1626 51
general_joke 0.9305 0.9123 0.9213 969
general_quirky 0.6984 0.5417 0.6102 8619
iot_cleaning 0.9590 0.9359 0.9473 1326
iot_coffee 0.9304 0.9749 0.9521 1836
iot_hue_lightchange 0.8794 0.9374 0.9075 1836
iot_hue_lightdim 0.8695 0.8711 0.8703 1071
iot_hue_lightoff 0.9440 0.9229 0.9334 2193
iot_hue_lighton 0.4545 0.5882 0.5128 153
iot_hue_lightup 0.9271 0.8315 0.8767 1377
iot_wemo_off 0.9615 0.8715 0.9143 918
iot_wemo_on 0.8455 0.7941 0.8190 510
lists_createoradd 0.8437 0.8356 0.8396 1989
lists_query 0.8918 0.8335 0.8617 2601
lists_remove 0.9536 0.8601 0.9044 2652
music_dislikeness 0.7725 0.7157 0.7430 204
music_likeness 0.8570 0.8159 0.8359 1836
music_query 0.8667 0.8050 0.8347 1785
music_settings 0.4024 0.3301 0.3627 306
news_query 0.8343 0.8657 0.8498 6324
play_audiobook 0.8172 0.8125 0.8149 2091
play_game 0.8666 0.8403 0.8532 1785
play_music 0.8683 0.8845 0.8763 8976
play_podcasts 0.8925 0.9125 0.9024 3213
play_radio 0.8260 0.8935 0.8585 3672
qa_currency 0.9459 0.9578 0.9518 1989
qa_definition 0.8638 0.8552 0.8595 2907
qa_factoid 0.7959 0.8178 0.8067 7191
qa_maths 0.8937 0.9302 0.9116 1275
qa_stock 0.7995 0.9412 0.8646 1326
recommendation_events 0.7646 0.7702 0.7674 2193
recommendation_locations 0.7489 0.8830 0.8104 1581
recommendation_movies 0.6907 0.7706 0.7285 1020
social_post 0.9623 0.9080 0.9344 4131
social_query 0.8104 0.7914 0.8008 1275
takeaway_order 0.7697 0.8458 0.8059 1122
takeaway_query 0.9059 0.8571 0.8808 1785
transport_query 0.8141 0.7559 0.7839 2601
transport_taxi 0.9222 0.9403 0.9312 1173
transport_ticket 0.9259 0.9384 0.9321 1785
transport_traffic 0.6919 0.9660 0.8063 765
weather_query 0.9387 0.9492 0.9439 7956
accuracy 0.8617 151674
macro avg 0.8162 0.8273 0.8178 151674
weighted avg 0.8639 0.8617 0.8613 151674