Spaces:

THUIR
/

AEOLLM

Running

App Files Files Community

陈俊杰 commited on Sep 2, 2024

Commit

1d69722

1 Parent(s): 193e99f

dataset

Browse files

Files changed (1) hide show

app.py +11 -11

app.py CHANGED Viewed

@@ -91,14 +91,6 @@ elif page == "Methodology":
     """,unsafe_allow_html=True)
 elif page == "Datasets":
-    st.header("Introduction to Task Datasets")
-    st.markdown("""
-<p class='main-text'>A brief description of the specific dataset we used, along with the original download link, is provided below:</p>
-<p class='main-text'>1. <strong>Summary Generation (SG): <a href="https://huggingface.co/datasets/EdinburghNLP/xsum">Xsum</a></strong>: A real-world single document news summary dataset collected from online articles by the British Broadcasting Corporation (BBC) and contains over 220 thousand news documents.</p>
-<p class='main-text'>2. <strong>Non-Factoid QA (NFQA): <a href="https://github.com/Lurunchik/NF-CATS">NF_CATS</a></strong>: A dataset contains examples of 12k natural questions divided into eight categories.</p>
-<p class='main-text'>3. <strong>Text Expansion (TE): <a href="https://huggingface.co/datasets/euclaise/writingprompts">WritingPrompts</a></strong>: A large dataset of 300K human-written stories paired with writing prompts from an online forum.</p>
-<p class='main-text'>4. <strong>Dialogue Generation (DG): <a href="https://huggingface.co/datasets/daily_dialog">DailyDialog</a></strong>: A high-quality dataset of 13k multi-turn dialogues. The language is human-written and less noisy.</p>
-    """,unsafe_allow_html=True)
     st.header("Answer Generation and Human Annotation")
     st.markdown("""
 We randomly sampled **100 instances** from **each** dataset as the question set and selected **7 different LLMs** to generate answers, forming the answer set. As a result, each dataset produced 700 instances, totaling **2,800 instances across the four datasets**.
@@ -109,11 +101,19 @@ For each instance (question-answer pair), we employed human annotators to provid
     st.markdown("""
 We divided the 2,800 instances into three parts:
-20% \of the data (covering all four datasets) was designated as the training set (including human annotations) for participants to reference when designing their methods.
-Another 20% \of the data was set aside as the test set (excluding human annotations), used to evaluate the performance of participants' methods and to generate the **leaderboard**.
-The remaining 60% \of the data was reserved for **the final evaluation**.
 Both the training set and the test set can be downloaded from the provided link: [https://huggingface.co/datasets/THUIR/AEOLLM](https://huggingface.co/datasets/THUIR/AEOLLM)
 """)
 elif page == "Important Dates":
     st.header("Important Dates")

     """,unsafe_allow_html=True)
 elif page == "Datasets":
     st.header("Answer Generation and Human Annotation")
     st.markdown("""
 We randomly sampled **100 instances** from **each** dataset as the question set and selected **7 different LLMs** to generate answers, forming the answer set. As a result, each dataset produced 700 instances, totaling **2,800 instances across the four datasets**.
     st.markdown("""
 We divided the 2,800 instances into three parts:
+- train set: 20% of the data (covering all four datasets) was designated as the training set (including human annotations) for participants to reference when designing their methods.
+- test set: Another 20% of the data was set aside as the test set (excluding human annotations), used to evaluate the performance of participants' methods and to generate the **leaderboard**.
+- reserved set: The remaining 60% of the data was reserved for **the final evaluation**.
 Both the training set and the test set can be downloaded from the provided link: [https://huggingface.co/datasets/THUIR/AEOLLM](https://huggingface.co/datasets/THUIR/AEOLLM)
 """)
+    st.header("Resources")
+    st.markdown("""
+<p class='main-text'>A brief description of the specific dataset we used, along with the original download link, is provided below:</p>
+<p class='main-text'>1. <strong>Summary Generation (SG): <a href="https://huggingface.co/datasets/EdinburghNLP/xsum">Xsum</a></strong>: A real-world single document news summary dataset collected from online articles by the British Broadcasting Corporation (BBC) and contains over 220 thousand news documents.</p>
+<p class='main-text'>2. <strong>Non-Factoid QA (NFQA): <a href="https://github.com/Lurunchik/NF-CATS">NF_CATS</a></strong>: A dataset contains examples of 12k natural questions divided into eight categories.</p>
+<p class='main-text'>3. <strong>Text Expansion (TE): <a href="https://huggingface.co/datasets/euclaise/writingprompts">WritingPrompts</a></strong>: A large dataset of 300K human-written stories paired with writing prompts from an online forum.</p>
+<p class='main-text'>4. <strong>Dialogue Generation (DG): <a href="https://huggingface.co/datasets/daily_dialog">DailyDialog</a></strong>: A high-quality dataset of 13k multi-turn dialogues. The language is human-written and less noisy.</p>
+    """,unsafe_allow_html=True)
 elif page == "Important Dates":
     st.header("Important Dates")