JordiBayarri
commited on
Commit
•
7eeb193
1
Parent(s):
e281914
Update README.md
Browse files
README.md
CHANGED
@@ -341,11 +341,13 @@ We used [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF) library. We aligned the
|
|
341 |
|
342 |
#### Summary
|
343 |
|
344 |
-
To compare Aloe with the most competitive open models (both general purpose and healthcare-specific) we use popular healthcare datasets (PubMedQA, MedMCQA, MedQA and MMLU for six medical tasks only), together with the new and highly reliable CareQA.
|
345 |
|
346 |
-
Benchmark results indicate the training conducted on Aloe has boosted its performance above Llama3-8B-Instruct. Llama3-Aloe-8B-Alpha outperforms larger models like Meditron 70B, and is close to larger base models, like Yi-34. For the former, this gain is consistent even when using SC-CoT, using their best-reported variant. All these results make Llama3-Aloe-8B-Alpha the best healthcare LLM of its size.
|
347 |
|
348 |
-
|
|
|
|
|
|
|
349 |
|
350 |
## Environmental Impact
|
351 |
|
|
|
341 |
|
342 |
#### Summary
|
343 |
|
344 |
+
To compare Aloe with the most competitive open models (both general purpose and healthcare-specific) we use popular healthcare datasets (PubMedQA, MedMCQA, MedQA and MMLU for six medical tasks only), together with the new and highly reliable CareQA. However, while MCQA benchmarks provide valuable insights into a model's ability to handle structured queries, they fall short of representing the full range of challenges faced in medical practice. Building upon this idea, Aloe-Beta represents the next step in the evolution of the Aloe Family, designed to broaden the scope beyond the multiple-choice question-answering tasks that define Aloe-Alpha.
|
345 |
|
|
|
346 |
|
347 |
+
Benchmark results indicate the training conducted on Aloe has boosted its performance achieving comparable results with SOTA models like Llama3-OpenBioLLLM, Llama3-Med42, MedPalm-2 and GPT-4. Llama31-Aloe-Beta-70B also outperforms the other existing medical models in the OpenLLM Leaderboard and in the evaluation of other medical tasks like Medical Factualy and Medical Treatment recommendations among others. All these results make Llama31-Aloe-Beta-70B one of the best existing models for healthcare.
|
348 |
+
|
349 |
+
With the help of prompting techniques the performance of Llama3-Aloe-8B-Beta is significantly improved. Medprompting in particular provides a 4% increase in reported accuracy, after which Llama31-Aloe-Beta-70B outperforms all the existing models that do not use RAG evaluation.
|
350 |
+
|
351 |
|
352 |
## Environmental Impact
|
353 |
|