陈俊杰
commited on
Commit
·
1d69722
1
Parent(s):
193e99f
dataset
Browse files
app.py
CHANGED
@@ -91,14 +91,6 @@ elif page == "Methodology":
|
|
91 |
""",unsafe_allow_html=True)
|
92 |
|
93 |
elif page == "Datasets":
|
94 |
-
st.header("Introduction to Task Datasets")
|
95 |
-
st.markdown("""
|
96 |
-
<p class='main-text'>A brief description of the specific dataset we used, along with the original download link, is provided below:</p>
|
97 |
-
<p class='main-text'>1. <strong>Summary Generation (SG): <a href="https://huggingface.co/datasets/EdinburghNLP/xsum">Xsum</a></strong>: A real-world single document news summary dataset collected from online articles by the British Broadcasting Corporation (BBC) and contains over 220 thousand news documents.</p>
|
98 |
-
<p class='main-text'>2. <strong>Non-Factoid QA (NFQA): <a href="https://github.com/Lurunchik/NF-CATS">NF_CATS</a></strong>: A dataset contains examples of 12k natural questions divided into eight categories.</p>
|
99 |
-
<p class='main-text'>3. <strong>Text Expansion (TE): <a href="https://huggingface.co/datasets/euclaise/writingprompts">WritingPrompts</a></strong>: A large dataset of 300K human-written stories paired with writing prompts from an online forum.</p>
|
100 |
-
<p class='main-text'>4. <strong>Dialogue Generation (DG): <a href="https://huggingface.co/datasets/daily_dialog">DailyDialog</a></strong>: A high-quality dataset of 13k multi-turn dialogues. The language is human-written and less noisy.</p>
|
101 |
-
""",unsafe_allow_html=True)
|
102 |
st.header("Answer Generation and Human Annotation")
|
103 |
st.markdown("""
|
104 |
We randomly sampled **100 instances** from **each** dataset as the question set and selected **7 different LLMs** to generate answers, forming the answer set. As a result, each dataset produced 700 instances, totaling **2,800 instances across the four datasets**.
|
@@ -109,11 +101,19 @@ For each instance (question-answer pair), we employed human annotators to provid
|
|
109 |
st.markdown("""
|
110 |
We divided the 2,800 instances into three parts:
|
111 |
|
112 |
-
20%
|
113 |
-
Another 20%
|
114 |
-
The remaining 60%
|
115 |
Both the training set and the test set can be downloaded from the provided link: [https://huggingface.co/datasets/THUIR/AEOLLM](https://huggingface.co/datasets/THUIR/AEOLLM)
|
116 |
""")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
117 |
|
118 |
elif page == "Important Dates":
|
119 |
st.header("Important Dates")
|
|
|
91 |
""",unsafe_allow_html=True)
|
92 |
|
93 |
elif page == "Datasets":
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
94 |
st.header("Answer Generation and Human Annotation")
|
95 |
st.markdown("""
|
96 |
We randomly sampled **100 instances** from **each** dataset as the question set and selected **7 different LLMs** to generate answers, forming the answer set. As a result, each dataset produced 700 instances, totaling **2,800 instances across the four datasets**.
|
|
|
101 |
st.markdown("""
|
102 |
We divided the 2,800 instances into three parts:
|
103 |
|
104 |
+
- train set: 20% of the data (covering all four datasets) was designated as the training set (including human annotations) for participants to reference when designing their methods.
|
105 |
+
- test set: Another 20% of the data was set aside as the test set (excluding human annotations), used to evaluate the performance of participants' methods and to generate the **leaderboard**.
|
106 |
+
- reserved set: The remaining 60% of the data was reserved for **the final evaluation**.
|
107 |
Both the training set and the test set can be downloaded from the provided link: [https://huggingface.co/datasets/THUIR/AEOLLM](https://huggingface.co/datasets/THUIR/AEOLLM)
|
108 |
""")
|
109 |
+
st.header("Resources")
|
110 |
+
st.markdown("""
|
111 |
+
<p class='main-text'>A brief description of the specific dataset we used, along with the original download link, is provided below:</p>
|
112 |
+
<p class='main-text'>1. <strong>Summary Generation (SG): <a href="https://huggingface.co/datasets/EdinburghNLP/xsum">Xsum</a></strong>: A real-world single document news summary dataset collected from online articles by the British Broadcasting Corporation (BBC) and contains over 220 thousand news documents.</p>
|
113 |
+
<p class='main-text'>2. <strong>Non-Factoid QA (NFQA): <a href="https://github.com/Lurunchik/NF-CATS">NF_CATS</a></strong>: A dataset contains examples of 12k natural questions divided into eight categories.</p>
|
114 |
+
<p class='main-text'>3. <strong>Text Expansion (TE): <a href="https://huggingface.co/datasets/euclaise/writingprompts">WritingPrompts</a></strong>: A large dataset of 300K human-written stories paired with writing prompts from an online forum.</p>
|
115 |
+
<p class='main-text'>4. <strong>Dialogue Generation (DG): <a href="https://huggingface.co/datasets/daily_dialog">DailyDialog</a></strong>: A high-quality dataset of 13k multi-turn dialogues. The language is human-written and less noisy.</p>
|
116 |
+
""",unsafe_allow_html=True)
|
117 |
|
118 |
elif page == "Important Dates":
|
119 |
st.header("Important Dates")
|