Corey Morris
commited on
Commit
•
3f507e0
1
Parent(s):
6ed8672
Added new hugging face results
Browse files
app.py
CHANGED
@@ -123,11 +123,11 @@ def find_top_differences_table(df, target_model, closest_models, num_differences
|
|
123 |
data_provider = ResultDataProcessor()
|
124 |
|
125 |
# st.title('Model Evaluation Results including MMLU by task')
|
126 |
-
st.title('Exploring the Characteristics of Large Language Models: An Interactive Portal for Analyzing
|
127 |
-
st.markdown("""***Last updated August
|
128 |
st.markdown("""**Models that are suspected to have training data contaminated with evaluation data have been removed.**""")
|
129 |
st.markdown("""
|
130 |
-
Hugging Face
|
131 |
[publicly available leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) and [dataset](https://huggingface.co/datasets/open-llm-leaderboard/results).
|
132 |
The Hugging Face leaderboard currently displays the overall result for Measuring Massive Multitask Language Understanding (MMLU), but not the results for individual tasks.
|
133 |
This app provides a way to explore the results for individual tasks and compare models across tasks.
|
|
|
123 |
data_provider = ResultDataProcessor()
|
124 |
|
125 |
# st.title('Model Evaluation Results including MMLU by task')
|
126 |
+
st.title('Exploring the Characteristics of Large Language Models: An Interactive Portal for Analyzing 1000+ Open Source Models Across 57 Diverse Evaluation Tasks')
|
127 |
+
st.markdown("""***Last updated August 26th***""")
|
128 |
st.markdown("""**Models that are suspected to have training data contaminated with evaluation data have been removed.**""")
|
129 |
st.markdown("""
|
130 |
+
Hugging Face runs evaluations on open source models and provides results on a
|
131 |
[publicly available leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) and [dataset](https://huggingface.co/datasets/open-llm-leaderboard/results).
|
132 |
The Hugging Face leaderboard currently displays the overall result for Measuring Massive Multitask Language Understanding (MMLU), but not the results for individual tasks.
|
133 |
This app provides a way to explore the results for individual tasks and compare models across tasks.
|
results
CHANGED
@@ -1 +1 @@
|
|
1 |
-
Subproject commit
|
|
|
1 |
+
Subproject commit 4f0a4395819faaf7fb9215d26ddee21f5dcf3c95
|