Spaces:

CoreyMorris
/

MMLU-by-task-Leaderboard

Running

App Files Files Community

MMLU-by-task-Leaderboard

4 contributors

History: 182 commits

CoreyMorris's picture

Updated with new results 11-21

3ebf7a7 about 1 year ago

.github
added a test and removed the code to only test a specific file because that code did not work about 1 year ago
.gitattributes

1.52 kB

initial commit over 1 year ago
.gitignore

68 Bytes

updated gitignore about 1 year ago
.gitmodules

106 Bytes

added hugging face evaluation harness results submodule over 1 year ago
README.md

202 Bytes

updated readme and requirements about 1 year ago
app.py

15.2 kB

Updated with new results 11-21 about 1 year ago
contaminated_models.csv

117 Bytes

Updated contaminated models over 1 year ago
contaminated_models.txt

65 Bytes

Updated contaminated models over 1 year ago
details_data_processor.py

4.04 kB

updated pipeline and init over 1 year ago
dev_requirements.txt

252 Bytes

updated dev requirements about 1 year ago
generate_csv.ipynb

1.07 kB

Added clickable links (#1) about 1 year ago
moral_app.py

11.1 kB

Extracted plotting functions from moral_app to plotting_utils to improve organization and testability about 1 year ago
moral_scenarios_questions.csv

370 kB

Show a random question from the moral scenarios evaluation about 1 year ago
plotting_utils.py

4.42 kB

Extracted plotting functions from moral_app to plotting_utils to improve organization and testability about 1 year ago
processed_data_2023-10-05.csv

1.35 MB

update about 1 year ago
processed_data_2023-10-06.csv

1.62 MB

Added clickable links (#1) about 1 year ago
processed_data_2023-10-08.csv

1.58 MB

added new result data about 1 year ago
processed_data_2023-11-18.csv

1.18 MB

updated dashboard with new data about 1 year ago
processed_data_2023-11-21.csv

1.25 MB

Updated with new results 11-21 about 1 year ago
requirements.txt

160 Bytes

updated readme and requirements about 1 year ago
result_data.csv

1.35 MB

updated about 1 year ago
result_data_processor.py

8.29 kB

Added clickable links (#1) about 1 year ago
save_for_regression.py

1.86 kB

changed to save and load in a directory over 1 year ago
split_question.py

964 Bytes

added code to split moral scenario question from one question to two about 1 year ago
test_details_data_processing.py

4.33 kB

added a test over 1 year ago
test_integration.py

1.96 kB

fixed test_streamlit_app_runs over 1 year ago
test_paths.py

780 Bytes

added a test and removed the code to only test a specific file because that code did not work about 1 year ago
test_regression.py

1.26 kB

added todo for test over 1 year ago
test_result_data_processing.py

1.66 kB

Added organization to dataframe over 1 year ago