Spaces:

CoreyMorris
/

MMLU-by-task-Leaderboard

Sleeping

App Files Files

MMLU-by-task-Leaderboard

428 kB

4 contributors

History: 152 commits

Corey Morris

added code to split moral scenario question from one question to two

65d6581 about 2 years ago

.github
added a test and removed the code to only test a specific file because that code did not work about 2 years ago
.gitattributes

1.52 kB

initial commit about 2 years ago
.gitignore

68 Bytes

updated gitignore about 2 years ago
.gitmodules

106 Bytes

added hugging face evaluation harness results submodule about 2 years ago
README.md

248 Bytes

initial commit about 2 years ago
app.py

16 kB

updated date and model count about 2 years ago
contaminated_models.csv

117 Bytes

Updated contaminated models about 2 years ago
contaminated_models.txt

65 Bytes

Updated contaminated models about 2 years ago
details_data_processor.py

4.04 kB

updated pipeline and init about 2 years ago
dev_requirements.txt

252 Bytes

updated dev requirements about 2 years ago
moral_app.py

11.1 kB

Extracted plotting functions from moral_app to plotting_utils to improve organization and testability about 2 years ago
moral_scenarios_questions.csv

370 kB

Show a random question from the moral scenarios evaluation about 2 years ago
plotting_utils.py

4.42 kB

Extracted plotting functions from moral_app to plotting_utils to improve organization and testability about 2 years ago
requirements.txt

156 Bytes

Updated dependencies about 2 years ago
result_data_processor.py

6.19 kB

Returning just a single file per model directory. Manually removing gpt-j-6b for now because there is something that is causing problems with processing the data about 2 years ago
save_for_regression.py

1.86 kB

changed to save and load in a directory about 2 years ago
split_question.py

964 Bytes

added code to split moral scenario question from one question to two about 2 years ago
test_details_data_processing.py

4.33 kB

added a test about 2 years ago
test_integration.py

1.96 kB

fixed test_streamlit_app_runs about 2 years ago
test_paths.py

780 Bytes

added a test and removed the code to only test a specific file because that code did not work about 2 years ago
test_regression.py

1.26 kB

added todo for test about 2 years ago
test_result_data_processing.py

1.66 kB

Added organization to dataframe about 2 years ago