陈俊杰 commited on
Commit
1d69722
·
1 Parent(s): 193e99f
Files changed (1) hide show
  1. app.py +11 -11
app.py CHANGED
@@ -91,14 +91,6 @@ elif page == "Methodology":
91
  """,unsafe_allow_html=True)
92
 
93
  elif page == "Datasets":
94
- st.header("Introduction to Task Datasets")
95
- st.markdown("""
96
- <p class='main-text'>A brief description of the specific dataset we used, along with the original download link, is provided below:</p>
97
- <p class='main-text'>1. <strong>Summary Generation (SG): <a href="https://huggingface.co/datasets/EdinburghNLP/xsum">Xsum</a></strong>: A real-world single document news summary dataset collected from online articles by the British Broadcasting Corporation (BBC) and contains over 220 thousand news documents.</p>
98
- <p class='main-text'>2. <strong>Non-Factoid QA (NFQA): <a href="https://github.com/Lurunchik/NF-CATS">NF_CATS</a></strong>: A dataset contains examples of 12k natural questions divided into eight categories.</p>
99
- <p class='main-text'>3. <strong>Text Expansion (TE): <a href="https://huggingface.co/datasets/euclaise/writingprompts">WritingPrompts</a></strong>: A large dataset of 300K human-written stories paired with writing prompts from an online forum.</p>
100
- <p class='main-text'>4. <strong>Dialogue Generation (DG): <a href="https://huggingface.co/datasets/daily_dialog">DailyDialog</a></strong>: A high-quality dataset of 13k multi-turn dialogues. The language is human-written and less noisy.</p>
101
- """,unsafe_allow_html=True)
102
  st.header("Answer Generation and Human Annotation")
103
  st.markdown("""
104
  We randomly sampled **100 instances** from **each** dataset as the question set and selected **7 different LLMs** to generate answers, forming the answer set. As a result, each dataset produced 700 instances, totaling **2,800 instances across the four datasets**.
@@ -109,11 +101,19 @@ For each instance (question-answer pair), we employed human annotators to provid
109
  st.markdown("""
110
  We divided the 2,800 instances into three parts:
111
 
112
- 20% \of the data (covering all four datasets) was designated as the training set (including human annotations) for participants to reference when designing their methods.
113
- Another 20% \of the data was set aside as the test set (excluding human annotations), used to evaluate the performance of participants' methods and to generate the **leaderboard**.
114
- The remaining 60% \of the data was reserved for **the final evaluation**.
115
  Both the training set and the test set can be downloaded from the provided link: [https://huggingface.co/datasets/THUIR/AEOLLM](https://huggingface.co/datasets/THUIR/AEOLLM)
116
  """)
 
 
 
 
 
 
 
 
117
 
118
  elif page == "Important Dates":
119
  st.header("Important Dates")
 
91
  """,unsafe_allow_html=True)
92
 
93
  elif page == "Datasets":
 
 
 
 
 
 
 
 
94
  st.header("Answer Generation and Human Annotation")
95
  st.markdown("""
96
  We randomly sampled **100 instances** from **each** dataset as the question set and selected **7 different LLMs** to generate answers, forming the answer set. As a result, each dataset produced 700 instances, totaling **2,800 instances across the four datasets**.
 
101
  st.markdown("""
102
  We divided the 2,800 instances into three parts:
103
 
104
+ - train set: 20% of the data (covering all four datasets) was designated as the training set (including human annotations) for participants to reference when designing their methods.
105
+ - test set: Another 20% of the data was set aside as the test set (excluding human annotations), used to evaluate the performance of participants' methods and to generate the **leaderboard**.
106
+ - reserved set: The remaining 60% of the data was reserved for **the final evaluation**.
107
  Both the training set and the test set can be downloaded from the provided link: [https://huggingface.co/datasets/THUIR/AEOLLM](https://huggingface.co/datasets/THUIR/AEOLLM)
108
  """)
109
+ st.header("Resources")
110
+ st.markdown("""
111
+ <p class='main-text'>A brief description of the specific dataset we used, along with the original download link, is provided below:</p>
112
+ <p class='main-text'>1. <strong>Summary Generation (SG): <a href="https://huggingface.co/datasets/EdinburghNLP/xsum">Xsum</a></strong>: A real-world single document news summary dataset collected from online articles by the British Broadcasting Corporation (BBC) and contains over 220 thousand news documents.</p>
113
+ <p class='main-text'>2. <strong>Non-Factoid QA (NFQA): <a href="https://github.com/Lurunchik/NF-CATS">NF_CATS</a></strong>: A dataset contains examples of 12k natural questions divided into eight categories.</p>
114
+ <p class='main-text'>3. <strong>Text Expansion (TE): <a href="https://huggingface.co/datasets/euclaise/writingprompts">WritingPrompts</a></strong>: A large dataset of 300K human-written stories paired with writing prompts from an online forum.</p>
115
+ <p class='main-text'>4. <strong>Dialogue Generation (DG): <a href="https://huggingface.co/datasets/daily_dialog">DailyDialog</a></strong>: A high-quality dataset of 13k multi-turn dialogues. The language is human-written and less noisy.</p>
116
+ """,unsafe_allow_html=True)
117
 
118
  elif page == "Important Dates":
119
  st.header("Important Dates")