mjwong
/

contriever-msmarco-mnli

Zero-Shot Classification

text-classification

Inference Endpoints

Model card Files Files and versions Community

contriever-msmarco-mnli / README.md

mjwong's picture

Update README.md

ba4ae51 over 1 year ago

|

2.63 kB

	---
	datasets:
	- glue
	model-index:
	- name: contriever-msmarco-mnli
	results: []
	pipeline_tag: zero-shot-classification
	language:
	- en
	license: mit
	---

	# contriever-msmarco-mnli

	This model is a fine-tuned version of [facebook/contriever-msmarco](https://huggingface.co/facebook/contriever-msmarco) on the glue dataset.

	## Model description

	[Unsupervised Dense Information Retrieval with Contrastive Learning](https://arxiv.org/abs/2112.09118).
	Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, Edouard Grave, arXiv 2021

	## How to use the model

	The model can be loaded with the `zero-shot-classification` pipeline like so:

	```python
	from transformers import pipeline
	classifier = pipeline("zero-shot-classification",
	model="mjwong/contriever-msmarco-mnli")
	```

	You can then use this pipeline to classify sequences into any of the class names you specify.

	```python
	sequence_to_classify = "one day I will see the world"
	candidate_labels = ['travel', 'cooking', 'dancing']
	classifier(sequence_to_classify, candidate_labels)
	#{'sequence': 'one day I will see the world',
	# 'labels': ['travel', 'dancing', 'cooking'],
	# 'scores': [0.9954835772514343, 0.002568634692579508, 0.00194773287512362]}
	```

	If more than one candidate label can be correct, pass `multi_class=True` to calculate each class independently:

	```python
	candidate_labels = ['travel', 'cooking', 'dancing', 'exploration']
	classifier(sequence_to_classify, candidate_labels, multi_class=True)
	#{'sequence': 'one day I will see the world',
	# 'labels': ['travel', 'exploration', 'cooking', 'dancing'],
	# 'scores': [0.9968098998069763,
	# 0.9796287417411804,
	# 0.027883002534508705,
	# 0.0008239754824899137]}
	```

	### Eval results
	The model was evaluated using the dev sets for MultiNLI and test sets for ANLI. The metric used is accuracy.

	\|Datasets\|mnli_dev_m\|mnli_dev_mm\|anli_test_r1\|anli_test_r2\|anli_test_r3\|
	\| :---: \| :---: \| :---: \| :---: \| :---: \| :---: \|
	\|[contriever-mnli](https://huggingface.co/mjwong/contriever-mnli)\|0.821\|0.822\|0.247\|0.281\|0.312\|
	\|[contriever-msmarco-mnli](https://huggingface.co/mjwong/contriever-msmarco-mnli)\|0.820\|0.819\|0.244\|0.296\|0.306\|

	### Training hyperparameters

	The following hyperparameters were used during training:

	- learning_rate: 2e-05
	- train_batch_size: 16
	- eval_batch_size: 16
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 5

	### Framework versions
	- Transformers 4.28.1
	- Pytorch 1.12.1+cu116
	- Datasets 2.11.0
	- Tokenizers 0.12.1