mjwong
/

e5-large-mnli-anli

Zero-Shot Classification

text-classification

Inference Endpoints

Model card Files Files and versions Community

e5-large-mnli-anli / README.md

mjwong's picture

Update README.md

031b048 over 1 year ago

|

3.5 kB

	---
	datasets:
	- glue
	- anli
	model-index:
	- name: e5-large-mnli-anli
	results: []
	pipeline_tag: zero-shot-classification
	language:
	- en
	license: mit
	---

	# e5-large-mnli-anli

	This model is a fine-tuned version of [intfloat/e5-large](https://huggingface.co/intfloat/e5-large) on the glue (mnli) and anli dataset.

	## Model description

	[Text Embeddings by Weakly-Supervised Contrastive Pre-training](https://arxiv.org/pdf/2212.03533.pdf).
	Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei, arXiv 2022

	## How to use the model

	### With the zero-shot classification pipeline

	The model can be loaded with the `zero-shot-classification` pipeline like so:

	```python
	from transformers import pipeline
	classifier = pipeline("zero-shot-classification",
	model="mjwong/e5-large-mnli-anli")
	```

	You can then use this pipeline to classify sequences into any of the class names you specify.

	```python
	sequence_to_classify = "one day I will see the world"
	candidate_labels = ['travel', 'cooking', 'dancing']
	classifier(sequence_to_classify, candidate_labels)
	```

	If more than one candidate label can be correct, pass `multi_class=True` to calculate each class independently:

	```python
	candidate_labels = ['travel', 'cooking', 'dancing', 'exploration']
	classifier(sequence_to_classify, candidate_labels, multi_class=True)
	```

	### With manual PyTorch

	The model can also be applied on NLI tasks like so:

	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	# device = "cuda:0" or "cpu"
	device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

	model_name = "mjwong/e5-large-mnli-anli"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSequenceClassification.from_pretrained(model_name)

	premise = "But I thought you'd sworn off coffee."
	hypothesis = "I thought that you vowed to drink more coffee."

	input = tokenizer(premise, hypothesis, truncation=True, return_tensors="pt")
	output = model(input["input_ids"].to(device))
	prediction = torch.softmax(output["logits"][0], -1).tolist()
	label_names = ["entailment", "neutral", "contradiction"]
	prediction = {name: round(float(pred) * 100, 2) for pred, name in zip(prediction, label_names)}
	print(prediction)
	```

	### Eval results
	The model was evaluated using the dev sets for MultiNLI and test sets for ANLI. The metric used is accuracy.

	\|Datasets\|mnli_dev_m\|mnli_dev_mm\|anli_test_r1\|anli_test_r2\|anli_test_r3\|
	\| :---: \| :---: \| :---: \| :---: \| :---: \| :---: \|
	\|[e5-base-v2-mnli-anli](https://huggingface.co/mjwong/e5-base-v2-mnli-anli)\|0.812\|0.809\|0.557\|0.460\|0.448\|
	\|[e5-large-mnli](https://huggingface.co/mjwong/e5-large-mnli)\|0.868\|0.869\|0.301\|0.296\|0.294\|
	\|[e5-large-mnli-anli](https://huggingface.co/mjwong/e5-large-mnli-anli)\|0.843\|0.848\|0.646\|0.484\|0.458\|
	\|[e5-large-v2-mnli](https://huggingface.co/mjwong/e5-large-v2-mnli)\|0.875\|0.876\|0.354\|0.298\|0.313\|
	\|[e5-large-v2-mnli-anli](https://huggingface.co/mjwong/e5-large-v2-mnli-anli)\|0.846\|0.848\|0.638\|0.474\|0.479\|

	### Training hyperparameters

	The following hyperparameters were used during training:

	- learning_rate: 2e-05
	- train_batch_size: 16
	- eval_batch_size: 16
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 2

	### Framework versions
	- Transformers 4.28.1
	- Pytorch 1.12.1+cu116
	- Datasets 2.11.0
	- Tokenizers 0.12.1