sileod
/

deberta-v3-base-tasksource-nli

Model card Files Files and versions Community

deberta-v3-base-tasksource-nli / README.md

librarian-bot

Librarian Bot: Update Hugging Face dataset ID

dc1aaa3 verified 8 months ago

preview code

raw

history blame

10.1 kB

	---
	language: en
	license: apache-2.0
	library_name: transformers
	tags:
	- deberta-v3-base
	- deberta-v3
	- deberta
	- text-classification
	- nli
	- natural-language-inference
	- multitask
	- multi-task
	- pipeline
	- extreme-multi-task
	- extreme-mtl
	- tasksource
	- zero-shot
	- rlhf
	datasets:
	- glue
	- super_glue
	- anli
	- tasksource/babi_nli
	- sick
	- snli
	- scitail
	- OpenAssistant/oasst1
	- universal_dependencies
	- hans
	- qbao775/PARARULE-Plus
	- alisawuffles/WANLI
	- metaeval/recast
	- sileod/probability_words_nli
	- joey234/nan-nli
	- pietrolesci/nli_fever
	- pietrolesci/breaking_nli
	- pietrolesci/conj_nli
	- pietrolesci/fracas
	- pietrolesci/dialogue_nli
	- pietrolesci/mpe
	- pietrolesci/dnc
	- pietrolesci/gpt3_nli
	- pietrolesci/recast_white
	- pietrolesci/joci
	- martn-nguyen/contrast_nli
	- pietrolesci/robust_nli
	- pietrolesci/robust_nli_is_sd
	- pietrolesci/robust_nli_li_ts
	- pietrolesci/gen_debiased_nli
	- pietrolesci/add_one_rte
	- metaeval/imppres
	- pietrolesci/glue_diagnostics
	- hlgd
	- PolyAI/banking77
	- paws
	- quora
	- medical_questions_pairs
	- conll2003
	- nlpaueb/finer-139
	- Anthropic/hh-rlhf
	- Anthropic/model-written-evals
	- truthful_qa
	- nightingal3/fig-qa
	- tasksource/bigbench
	- blimp
	- cos_e
	- cosmos_qa
	- dream
	- openbookqa
	- qasc
	- quartz
	- quail
	- head_qa
	- sciq
	- social_i_qa
	- wiki_hop
	- wiqa
	- piqa
	- hellaswag
	- pkavumba/balanced-copa
	- 12ml/e-CARE
	- art
	- tasksource/mmlu
	- winogrande
	- codah
	- allenai/ai2_arc
	- definite_pronoun_resolution
	- swag
	- math_qa
	- metaeval/utilitarianism
	- mteb/amazon_counterfactual
	- SetFit/insincere-questions
	- SetFit/toxic_conversations
	- turingbench/TuringBench
	- trec
	- tals/vitaminc
	- hope_edi
	- strombergnlp/rumoureval_2019
	- ethos
	- tweet_eval
	- discovery
	- pragmeval
	- silicone
	- lex_glue
	- papluca/language-identification
	- imdb
	- rotten_tomatoes
	- ag_news
	- yelp_review_full
	- financial_phrasebank
	- poem_sentiment
	- dbpedia_14
	- amazon_polarity
	- app_reviews
	- hate_speech18
	- sms_spam
	- humicroedit
	- snips_built_in_intents
	- banking77
	- hate_speech_offensive
	- yahoo_answers_topics
	- pacovaldez/stackoverflow-questions
	- zapsdcn/hyperpartisan_news
	- zapsdcn/sciie
	- zapsdcn/citation_intent
	- go_emotions
	- allenai/scicite
	- liar
	- relbert/lexical_relation_classification
	- metaeval/linguisticprobing
	- tasksource/crowdflower
	- metaeval/ethics
	- emo
	- google_wellformed_query
	- tweets_hate_speech_detection
	- has_part
	- wnut_17
	- ncbi_disease
	- acronym_identification
	- jnlpba
	- species_800
	- SpeedOfMagic/ontonotes_english
	- blog_authorship_corpus
	- launch/open_question_type
	- health_fact
	- commonsense_qa
	- mc_taco
	- ade_corpus_v2
	- prajjwal1/discosense
	- circa
	- PiC/phrase_similarity
	- copenlu/scientific-exaggeration-detection
	- quarel
	- mwong/fever-evidence-related
	- numer_sense
	- dynabench/dynasent
	- raquiba/Sarcasm_News_Headline
	- sem_eval_2010_task_8
	- demo-org/auditor_review
	- medmcqa
	- aqua_rat
	- RuyuanWan/Dynasent_Disagreement
	- RuyuanWan/Politeness_Disagreement
	- RuyuanWan/SBIC_Disagreement
	- RuyuanWan/SChem_Disagreement
	- RuyuanWan/Dilemmas_Disagreement
	- lucasmccabe/logiqa
	- wiki_qa
	- metaeval/cycic_classification
	- metaeval/cycic_multiplechoice
	- metaeval/sts-companion
	- metaeval/commonsense_qa_2.0
	- metaeval/lingnli
	- metaeval/monotonicity-entailment
	- metaeval/arct
	- metaeval/scinli
	- metaeval/naturallogic
	- onestop_qa
	- demelin/moral_stories
	- corypaik/prost
	- aps/dynahate
	- metaeval/syntactic-augmentation-nli
	- metaeval/autotnli
	- lasha-nlp/CONDAQA
	- openai/webgpt_comparisons
	- Dahoas/synthetic-instruct-gptj-pairwise
	- metaeval/scruples
	- metaeval/wouldyourather
	- sileod/attempto-nli
	- metaeval/defeasible-nli
	- metaeval/help-nli
	- metaeval/nli-veridicality-transitivity
	- metaeval/natural-language-satisfiability
	- metaeval/lonli
	- tasksource/dadc-limit-nli
	- ColumbiaNLP/FLUTE
	- metaeval/strategy-qa
	- openai/summarize_from_feedback
	- tasksource/folio
	- metaeval/tomi-nli
	- metaeval/avicenna
	- stanfordnlp/SHP
	- GBaker/MedQA-USMLE-4-options-hf
	- GBaker/MedQA-USMLE-4-options
	- sileod/wikimedqa
	- declare-lab/cicero
	- amydeng2000/CREAK
	- metaeval/mutual
	- inverse-scaling/NeQA
	- inverse-scaling/quote-repetition
	- inverse-scaling/redefine-math
	- tasksource/puzzte
	- metaeval/implicatures
	- race
	- metaeval/spartqa-yn
	- metaeval/spartqa-mchoice
	- metaeval/temporal-nli
	- metaeval/ScienceQA_text_only
	- AndyChiang/cloth
	- metaeval/logiqa-2.0-nli
	- tasksource/oasst1_dense_flat
	- metaeval/boolq-natural-perturbations
	- metaeval/path-naturalness-prediction
	- riddle_sense
	- Jiangjie/ekar_english
	- metaeval/implicit-hate-stg1
	- metaeval/chaos-mnli-ambiguity
	- IlyaGusev/headline_cause
	- metaeval/race-c
	- metaeval/equate
	- metaeval/ambient
	- AndyChiang/dgen
	- metaeval/clcd-english
	- civil_comments
	- metaeval/acceptability-prediction
	- maximedb/twentyquestions
	- metaeval/counterfactually-augmented-snli
	- tasksource/I2D2
	- sileod/mindgames
	- metaeval/counterfactually-augmented-imdb
	- metaeval/cnli
	- metaeval/reclor
	- tasksource/oasst1_pairwise_rlhf_reward
	- tasksource/zero-shot-label-nli
	- webis/args_me
	- webis/Touche23-ValueEval
	- tasksource/starcon
	- tasksource/ruletaker
	- lighteval/lsat_qa
	- tasksource/ConTRoL-nli
	- tasksource/tracie
	- tasksource/sherliic
	- tasksource/sen-making
	- tasksource/winowhy
	- mediabiasgroup/mbib-base
	- tasksource/robustLR
	- CLUTRR/v1
	- tasksource/logical-fallacy
	- tasksource/parade
	- tasksource/cladder
	- tasksource/subjectivity
	- tasksource/MOH
	- tasksource/VUAC
	- tasksource/TroFi
	- sharc_modified
	- tasksource/conceptrules_v2
	- tasksource/disrpt
	- conll2000
	- DFKI-SLT/few-nerd
	- tasksource/com2sense
	- tasksource/scone
	- tasksource/winodict
	- tasksource/fool-me-twice
	- tasksource/monli
	- tasksource/corr2cause
	- tasksource/apt
	- zeroshot/twitter-financial-news-sentiment
	- tasksource/icl-symbol-tuning-instruct
	- tasksource/SpaceNLI
	- sihaochen/propsegment
	- HannahRoseKirk/HatemojiBuild
	- tasksource/regset
	- tasksource/babi_nli
	- lmsys/chatbot_arena_conversations
	metrics:
	- accuracy
	pipeline_tag: zero-shot-classification
	model-index:
	- name: deberta-v3-base-tasksource-nli
	results:
	- task:
	type: text-classification
	name: Text Classification
	dataset:
	name: glue
	type: glue
	config: rte
	split: validation
	metrics:
	- type: accuracy
	value: 0.89
	- task:
	type: natural-language-inference
	name: Natural Language Inference
	dataset:
	name: anli-r3
	type: anli
	config: plain_text
	split: validation
	metrics:
	- type: accuracy
	value: 0.52
	name: Accuracy
	---

	# Model Card for DeBERTa-v3-base-tasksource-nli

	This is [DeBERTa-v3-base](https://hf.co/microsoft/deberta-v3-base) fine-tuned with multi-task learning on 600+ tasks of the [tasksource collection](https://github.com/sileod/tasksource/).
	This checkpoint has strong zero-shot validation performance on many tasks (e.g. 70% on WNLI), and can be used for:
	- Zero-shot entailment-based classification for arbitrary labels [ZS].
	- Natural language inference [NLI]
	- Hundreds of previous tasks with tasksource-adapters [TA].
	- Further fine-tuning on a new task or tasksource task (classification, token classification or multiple-choice) [FT].

	# [ZS] Zero-shot classification pipeline
	```python
	from transformers import pipeline
	classifier = pipeline("zero-shot-classification",model="sileod/deberta-v3-base-tasksource-nli")

	text = "one day I will see the world"
	candidate_labels = ['travel', 'cooking', 'dancing']
	classifier(text, candidate_labels)
	```
	NLI training data of this model includes [label-nli](https://huggingface.co/datasets/tasksource/zero-shot-label-nli), a NLI dataset specially constructed to improve this kind of zero-shot classification.

	# [NLI] Natural language inference pipeline

	```python
	from transformers import pipeline
	pipe = pipeline("text-classification",model="sileod/deberta-v3-base-tasksource-nli")
	pipe([dict(text='there is a cat',
	text_pair='there is a black cat')]) #list of (premise,hypothesis)
	# [{'label': 'neutral', 'score': 0.9952911138534546}]
	```

	# [TA] Tasksource-adapters: 1 line access to hundreds of tasks

	```python
	# !pip install tasknet
	import tasknet as tn
	pipe = tn.load_pipeline('sileod/deberta-v3-base-tasksource-nli','glue/sst2') # works for 500+ tasksource tasks
	pipe(['That movie was great !', 'Awful movie.'])
	# [{'label': 'positive', 'score': 0.9956}, {'label': 'negative', 'score': 0.9967}]
	```
	The list of tasks is available in model config.json.
	This is more efficient than ZS since it requires only one forward pass per example, but it is less flexible.


	# [FT] Tasknet: 3 lines fine-tuning

	```python
	# !pip install tasknet
	import tasknet as tn
	hparams=dict(model_name='sileod/deberta-v3-base-tasksource-nli', learning_rate=2e-5)
	model, trainer = tn.Model_Trainer([tn.AutoTask("glue/rte")], hparams)
	trainer.train()
	```

	## Evaluation
	This model ranked 1st among all models with the microsoft/deberta-v3-base architecture according to the IBM model recycling evaluation.
	https://ibm.github.io/model-recycling/

	### Software and training details

	The model was trained on 600 tasks for 200k steps with a batch size of 384 and a peak learning rate of 2e-5. Training took 12 days on Nvidia A30 24GB gpu.
	This is the shared model with the MNLI classifier on top. Each task had a specific CLS embedding, which is dropped 10% of the time to facilitate model use without it. All multiple-choice model used the same classification layers. For classification tasks, models shared weights if their labels matched.


	https://github.com/sileod/tasksource/ \
	https://github.com/sileod/tasknet/ \
	Training code: https://colab.research.google.com/drive/1iB4Oxl9_B5W3ZDzXoWJN-olUbqLBxgQS?usp=sharing

	# Citation

	More details on this [article:](https://arxiv.org/abs/2301.05948)
	```
	@article{sileo2023tasksource,
	title={tasksource: Structured Dataset Preprocessing Annotations for Frictionless Extreme Multi-Task Learning and Evaluation},
	author={Sileo, Damien},
	url= {https://arxiv.org/abs/2301.05948},
	journal={arXiv preprint arXiv:2301.05948},
	year={2023}
	}
	```


	# Model Card Contact

	damien.sileo@inria.fr


	</details>