lewtun HF staff

Add evaluation results on the autoevaluate--squad-sample config and test split of autoevaluate/squad-sample

716a2ae about 2 years ago

1.91 kB

	---
	language: en
	license: apache-2.0
	datasets:
	- squad
	metrics:
	- squad
	model-index:
	- name: autoevaluate/distilbert-base-cased-distilled-squad
	results:
	- task:
	type: question-answering
	name: Question Answering
	dataset:
	name: autoevaluate/squad-sample
	type: autoevaluate/squad-sample
	config: autoevaluate--squad-sample
	split: test
	metrics:
	- type: f1
	value: 87.8248
	name: F1
	verified: true
	verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNWRkMTE3Mzg1YjhhYjA3NmI0ZjRjMmQ4ZmU0ODUyMzFjYTdlMDdjM2U0NDYyNDRjOTJiMzA0Yjk4ODlmNWUwMCIsInZlcnNpb24iOjF9.rld1n6sOped3yqbgs4P6egT3g-Eq3pt-tOkCewF9DzQSkl7m0B2AnwKp3wuXtd9e-x8siemGqEVTwsMkTpFmCA
	- type: exact_match
	value: 84.0
	name: Exact Match
	verified: true
	verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNjJiOWU4MmZkZTJlZDliZWRhYjBhMjhmM2YzYjlmYTE3ZmQ5OGU4MzFiYjViNDU5ZDJlODI5YzQzODI2OWQ3NSIsInZlcnNpb24iOjF9.XtqAnYzCO0_oll5_txjeIYsiAhdTIBYq6VYLrtBcp7O0t4hFpBIdKlLAoYSvfhLz-Bf3tsl-r6CrCrzvTmWQCg
	---

	# DistilBERT base cased distilled SQuAD

	> Note: This model is a clone of [`distilbert-base-cased-distilled-squad`](https://huggingface.co/distilbert-base-cased-distilled-squad) for internal testing.

	This model is a fine-tune checkpoint of [DistilBERT-base-cased](https://huggingface.co/distilbert-base-cased), fine-tuned using (a second step of) knowledge distillation on SQuAD v1.1.
	This model reaches a F1 score of 87.1 on the dev set (for comparison, BERT bert-base-cased version reaches a F1 score of 88.7).

	Using the question answering `Evaluator` from evaluate gives:

	```
	{'exact_match': 79.54588457899716,
	'f1': 86.81181300991533,
	'latency_in_seconds': 0.008683730778997168,
	'samples_per_second': 115.15787689073015,
	'total_time_in_seconds': 91.78703433400005}
	```

	which is roughly consistent with the official score.