simonhughes22
commited on
Commit
•
1a1b26d
1
Parent(s):
d5ff4ed
Update README.md
Browse files
README.md
CHANGED
@@ -9,10 +9,9 @@ The model was trained on the NLI data and a variety of datasets evaluating summa
|
|
9 |
|
10 |
## Performance
|
11 |
|
12 |
-
TRUE Dataset (Minus Vitamin C, FEVER and PAWS) - 0.872 AUC Score
|
13 |
-
SummaC Benchmark (Test) - 0.764 Balanced Accuracy
|
14 |
-
|
15 |
-
[AnyScale Ranking Test](https://www.anyscale.com/blog/llama-2-is-about-as-factually-accurate-as-gpt-4-for-summaries-and-is-30x-cheaper) - 86.6 % Accuracy
|
16 |
|
17 |
## Usage
|
18 |
|
|
|
9 |
|
10 |
## Performance
|
11 |
|
12 |
+
* [TRUE Dataset (Minus Vitamin C, FEVER and PAWS)](https://arxiv.org/pdf/2204.04991.pdf) - 0.872 AUC Score
|
13 |
+
* [SummaC Benchmark (Test Split)](https://aclanthology.org/2022.tacl-1.10.pdf) - 0.764 Balanced Accuracy, 0.831 AUC Score
|
14 |
+
* [AnyScale Ranking Test for Hallucinations](https://www.anyscale.com/blog/llama-2-is-about-as-factually-accurate-as-gpt-4-for-summaries-and-is-30x-cheaper) - 86.6 % Accuracy
|
|
|
15 |
|
16 |
## Usage
|
17 |
|