陈俊杰 commited on
Commit
b91860a
1 Parent(s): cb1e5bf
Files changed (1) hide show
  1. app.py +5 -6
app.py CHANGED
@@ -123,10 +123,9 @@ with st.sidebar:
123
  st.markdown("""
124
  <style>
125
  /* 应用到所有的Markdown渲染文本 */
126
- div[data-testid="stMarkdownContainer"] {
127
- font-size: 48px !important;
128
- font-family: 'Times New Roman', serif !important;
129
- line-height: 1.8 !important;
130
  }
131
  </style>
132
  """, unsafe_allow_html=True)
@@ -135,8 +134,8 @@ st.markdown("""
135
  if page == "Introduction":
136
  st.header("Introduction")
137
  st.markdown("""
138
- <div style='font-size: 24px;line-height: 1.6;'>
139
- The Automatic Evaluation of LLMs (AEOLLM) task is a new core task in [NTCIR-18](http://research.nii.ac.jp/ntcir/ntcir-18) to support in-depth research on large language models (LLMs) evaluation. As LLMs grow popular in both fields of academia and industry, how to effectively evaluate the capacity of LLMs becomes an increasingly critical but still challenging issue. Existing methods can be divided into two types: manual evaluation, which is expensive, and automatic evaluation, which faces many limitations including the task format (the majority belong to multiple-choice questions) and evaluation criteria (occupied by reference-based metrics). To advance the innovation of automatic evaluation, we proposed the Automatic Evaluation of LLMs (AEOLLM) task which focuses on generative tasks and encourages reference-free methods. Besides, we set up diverse subtasks such as summary generation, non-factoid question answering, text expansion, and dialogue generation to comprehensively test different methods. We believe that the AEOLLM task will facilitate the development of the LLMs community.
140
  </div>
141
  """, unsafe_allow_html=True)
142
 
 
123
  st.markdown("""
124
  <style>
125
  /* 应用到所有的Markdown渲染文本 */
126
+ .main-text {
127
+ font-size: 24px;
128
+ line-height: 1.6;
 
129
  }
130
  </style>
131
  """, unsafe_allow_html=True)
 
134
  if page == "Introduction":
135
  st.header("Introduction")
136
  st.markdown("""
137
+ <div class='main-text'>
138
+ <p>The Automatic Evaluation of LLMs (AEOLLM) task is a new core task in <a href="http://research.nii.ac.jp/ntcir/ntcir-18">NTCIR-18</a> to support in-depth research on large language models (LLMs) evaluation. As LLMs grow popular in both fields of academia and industry, how to effectively evaluate the capacity of LLMs becomes an increasingly critical but still challenging issue. Existing methods can be divided into two types: manual evaluation, which is expensive, and automatic evaluation, which faces many limitations including the task format (the majority belong to multiple-choice questions) and evaluation criteria (occupied by reference-based metrics). To advance the innovation of automatic evaluation, we proposed the Automatic Evaluation of LLMs (AEOLLM) task which focuses on generative tasks and encourages reference-free methods. Besides, we set up diverse subtasks such as summary generation, non-factoid question answering, text expansion, and dialogue generation to comprehensively test different methods. We believe that the AEOLLM task will facilitate the development of the LLMs community.</p>
139
  </div>
140
  """, unsafe_allow_html=True)
141