Tweak code
Browse files- app/src/index.html +3 -2
app/src/index.html
CHANGED
@@ -158,13 +158,14 @@
|
|
158 |
|
159 |
<p id="15c1384e-bcac-8076-9664-caa1b098c89c" class="">One subtlety with evaluating answers to math problems is that strings like \(1/\sqrt{3}\) and \(\sqrt{3}/3\) are distinct, but represent mathematically equivalent answers. The standard <a href="https://huggingface.co/papers/2206.14858">way</a> to handle this is to convert the a pair of answers to SymPy objects and then check whether subtracting the two objects and applying <code>sympy.simplify</code> gives zero. </p><p id="15c1384e-bcac-8043-a1d3-ffa0c5c3406e" class="">While this approach works well when comparing a small number of candidate answers, we found it was terribly slow when comparing many pairs in a list of \(N\) candidates; in some cases, slower than generating the candidates in the first place! To deal with this, we first reduced each answer to its <a href="https://en.wikipedia.org/wiki/Canonical_form">canonical form</a> and then computed the frequency of each form to determine the majority vote. Expand the detail below if you’re curious about how the code looks.</p>
|
160 |
|
161 |
-
<details><summary style="font-weight:600;font-size:1.25em;line-height:1.3;margin:0"><strong>Implementation detail</strong></summary><div class="indented"><p id="15d1384e-bcac-8080-b676-e09848694520" class="">To obtain the canonical form of an algebraic expression, we first convert the LaTeX string to SymPy, apply <code>sympy.simplify</code>, and finally convert back to LaTeX: </p
|
|
|
162 |
from sympy import latex, simplify
|
163 |
|
164 |
def get_canonical_form(expression: str) -> str:
|
165 |
parsed_expr = latex2sympy(expression)
|
166 |
simplified_expr = simplify(parsed_expr)
|
167 |
-
return latex(simplified_expr)</code></pre><p id="15c1384e-bcac-80ea-92cd-d199d25281b4" class="">With this function, we can then iterate over all candidate solutions in an list and keep track of how many times a canonical form has been seen before computing the final majority vote:</p><script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/prism.min.js" integrity="sha512-7Z9J3l1+EYfeaPKcGXu3MS/7T+w19WtKQY/n+xzmw4hZhJ9tyYmcUS+4QqAlzhicE5LAfMQSF3iFTK9bQdTxXg==" crossorigin="anonymous" referrerPolicy="no-referrer"></script><link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/themes/prism.min.css" integrity="sha512-tN7Ec6zAFaVSG3TpNAKtk4DOHNpSwKHxxrsiw4GHKESGPs5njn/0sMCUMl2svV4wo4BK/rCP7juYz+zx+l6oeQ==" crossorigin="anonymous" referrerPolicy="no-referrer"/><pre id="15d1384e-bcac-80f3-9062-c4f44a0d0b8b" class="code"><code class="language-Python">def find_majority_answer(answers:
|
168 |
canonical_groups = defaultdict(int)
|
169 |
canonical_to_original = {}
|
170 |
|
|
|
158 |
|
159 |
<p id="15c1384e-bcac-8076-9664-caa1b098c89c" class="">One subtlety with evaluating answers to math problems is that strings like \(1/\sqrt{3}\) and \(\sqrt{3}/3\) are distinct, but represent mathematically equivalent answers. The standard <a href="https://huggingface.co/papers/2206.14858">way</a> to handle this is to convert the a pair of answers to SymPy objects and then check whether subtracting the two objects and applying <code>sympy.simplify</code> gives zero. </p><p id="15c1384e-bcac-8043-a1d3-ffa0c5c3406e" class="">While this approach works well when comparing a small number of candidate answers, we found it was terribly slow when comparing many pairs in a list of \(N\) candidates; in some cases, slower than generating the candidates in the first place! To deal with this, we first reduced each answer to its <a href="https://en.wikipedia.org/wiki/Canonical_form">canonical form</a> and then computed the frequency of each form to determine the majority vote. Expand the detail below if you’re curious about how the code looks.</p>
|
160 |
|
161 |
+
<details><summary style="font-weight:600;font-size:1.25em;line-height:1.3;margin:0"><strong>Implementation detail</strong></summary><div class="indented"><p id="15d1384e-bcac-8080-b676-e09848694520" class="">To obtain the canonical form of an algebraic expression, we first convert the LaTeX string to SymPy, apply <code>sympy.simplify</code>, and finally convert back to LaTeX: </p>
|
162 |
+
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/prism.min.js" integrity="sha512-7Z9J3l1+EYfeaPKcGXu3MS/7T+w19WtKQY/n+xzmw4hZhJ9tyYmcUS+4QqAlzhicE5LAfMQSF3iFTK9bQdTxXg==" crossorigin="anonymous" referrerPolicy="no-referrer"></script><link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/themes/prism.min.css" integrity="sha512-tN7Ec6zAFaVSG3TpNAKtk4DOHNpSwKHxxrsiw4GHKESGPs5njn/0sMCUMl2svV4wo4BK/rCP7juYz+zx+l6oeQ==" crossorigin="anonymous" referrerPolicy="no-referrer"/><pre id="15b1384e-bcac-80c0-96c2-feba0d1cc92e" class="code"><code class="language-Python">from latex2sympy2 import latex2sympy
|
163 |
from sympy import latex, simplify
|
164 |
|
165 |
def get_canonical_form(expression: str) -> str:
|
166 |
parsed_expr = latex2sympy(expression)
|
167 |
simplified_expr = simplify(parsed_expr)
|
168 |
+
return latex(simplified_expr)</code></pre><p id="15c1384e-bcac-80ea-92cd-d199d25281b4" class="">With this function, we can then iterate over all candidate solutions in an list and keep track of how many times a canonical form has been seen before computing the final majority vote:</p><script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/prism.min.js" integrity="sha512-7Z9J3l1+EYfeaPKcGXu3MS/7T+w19WtKQY/n+xzmw4hZhJ9tyYmcUS+4QqAlzhicE5LAfMQSF3iFTK9bQdTxXg==" crossorigin="anonymous" referrerPolicy="no-referrer"></script><link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/themes/prism.min.css" integrity="sha512-tN7Ec6zAFaVSG3TpNAKtk4DOHNpSwKHxxrsiw4GHKESGPs5njn/0sMCUMl2svV4wo4BK/rCP7juYz+zx+l6oeQ==" crossorigin="anonymous" referrerPolicy="no-referrer"/><pre id="15d1384e-bcac-80f3-9062-c4f44a0d0b8b" class="code"><code class="language-Python">def find_majority_answer(answers: list[str]) -> str:
|
169 |
canonical_groups = defaultdict(int)
|
170 |
canonical_to_original = {}
|
171 |
|