Update README.md
Browse files
README.md
CHANGED
@@ -310,7 +310,7 @@ This checkpoint has strong zero-shot validation performance on many tasks (e.g.
|
|
310 |
| anli/a2 | 47.2 |
|
311 |
| anli/a3 | 49.4 |
|
312 |
| nli_fever | 79.4 |
|
313 |
-
|
|
314 |
| ConTRoL-nli | 63.3 |
|
315 |
| cladder | 71.1 |
|
316 |
| zero-shot-label-nli | 74.4 |
|
@@ -318,6 +318,8 @@ This checkpoint has strong zero-shot validation performance on many tasks (e.g.
|
|
318 |
| oasst2_pairwise_rlhf_reward | 73.9 |
|
319 |
| doc-nli | 90.0 |
|
320 |
|
|
|
|
|
321 |
# [ZS] Zero-shot classification pipeline
|
322 |
```python
|
323 |
from transformers import pipeline
|
|
|
310 |
| anli/a2 | 47.2 |
|
311 |
| anli/a3 | 49.4 |
|
312 |
| nli_fever | 79.4 |
|
313 |
+
| FOLIO | 61.8 |
|
314 |
| ConTRoL-nli | 63.3 |
|
315 |
| cladder | 71.1 |
|
316 |
| zero-shot-label-nli | 74.4 |
|
|
|
318 |
| oasst2_pairwise_rlhf_reward | 73.9 |
|
319 |
| doc-nli | 90.0 |
|
320 |
|
321 |
+
Zero-shot GPT-4 scores 61% on FOLIO (logical reasoning), 62% on cladder (probabilistic reasoning) and 56.4% on ConTRoL (long context NLI).
|
322 |
+
|
323 |
# [ZS] Zero-shot classification pipeline
|
324 |
```python
|
325 |
from transformers import pipeline
|