Corey Morris
commited on
Commit
•
ea8703d
1
Parent(s):
1e6b767
Added citation for the site
Browse files
app.py
CHANGED
@@ -244,17 +244,19 @@ st.plotly_chart(fig)
|
|
244 |
st.markdown("***Thank you to hugging face for running the evaluations and supplying the data as well as the original authors of the evaluations.***")
|
245 |
|
246 |
st.markdown("""
|
247 |
-
#
|
248 |
|
249 |
-
1.
|
|
|
|
|
250 |
|
251 |
-
|
252 |
|
253 |
-
|
254 |
|
255 |
-
|
256 |
|
257 |
-
|
258 |
|
259 |
-
|
260 |
""")
|
|
|
244 |
st.markdown("***Thank you to hugging face for running the evaluations and supplying the data as well as the original authors of the evaluations.***")
|
245 |
|
246 |
st.markdown("""
|
247 |
+
# Citation
|
248 |
|
249 |
+
1. Corey Morris (2023). *MMLU-by-Task Evaluation Results for 700+ Open Source Models*. [link](https://huggingface.co/spaces/CoreyMorris/MMLU-by-task-Leaderboard)
|
250 |
+
|
251 |
+
2. Edward Beeching, Clémentine Fourrier, Nathan Habib, Sheon Han, Nathan Lambert, Nazneen Rajani, Omar Sanseviero, Lewis Tunstall, Thomas Wolf. (2023). *Open LLM Leaderboard*. Hugging Face. [link](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
252 |
|
253 |
+
3. Gao, Leo et al. (2021). *A framework for few-shot language model evaluation*. Zenodo. [link](https://doi.org/10.5281/zenodo.5371628)
|
254 |
|
255 |
+
4. Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord. (2018). *Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge*. arXiv. [link](https://arxiv.org/abs/1803.05457)
|
256 |
|
257 |
+
5. Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, Yejin Choi. (2019). *HellaSwag: Can a Machine Really Finish Your Sentence?*. arXiv. [link](https://arxiv.org/abs/1905.07830)
|
258 |
|
259 |
+
6. Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt. (2021). *Measuring Massive Multitask Language Understanding*. arXiv. [link](https://arxiv.org/abs/2009.03300)
|
260 |
|
261 |
+
7. Stephanie Lin, Jacob Hilton, Owain Evans. (2022). *TruthfulQA: Measuring How Models Mimic Human Falsehoods*. arXiv. [link](https://arxiv.org/abs/2109.07958)
|
262 |
""")
|