Spaces:

numind
/

NuExtract

Running on L4

App Files Files Community

Alexandre-Numind commited on Jun 20

Commit

168d307

•

1 Parent(s): 4eb458a

Update app.py

Browse files

Files changed (1) hide show

app.py +60 -43

app.py CHANGED Viewed

@@ -126,53 +126,29 @@ divisions""","""{
 example4 = ("""
 Patient: Good evening doctor.
 Doctor: Good evening. You look pale and your voice is out of tune.
 Patient: Yes doctor. I’m running a temperature and have a sore throat.
 Doctor: Lemme see.
 (He touches the forehead to feel the temperature.)
 Doctor: You’ve moderate fever.
 (He then whips out a thermometer.)
 Patient: This thermometer is very different from the one you used the last time. (Unlike the earlier one which was placed below the tongue, this one snapped around one of the fingers.)
 Doctor: Yes, this is a new introduction by medical equipment companies. It’s much more convenient, as it doesn’t require cleaning after every use.
 Patient: That’s awesome.
 Doctor: Yes it is.
 (He removes the thermometer and looks at the reading.)
 Doctor: Not too high – 99.8.
 (He then proceeds with measuring blood pressure.)
 Doctor: Your blood pressure is fine.
 (He then checks the throat.)
 Doctor: It looks bit scruffy. Not good.
 Patient: Yes, it has been quite bad.
 Doctor: Do you get sweating and shivering?
 Patient: Not sweating, but I feel somewhat cold when I sit under a fan.
 Doctor: OK. You’ve few symptoms of malaria. I would suggest you undergo blood test. Nothing to worry about. In most cases, the test come out to be negative. It’s just precautionary, as there have been spurt in malaria cases in the last month or so.
 (He then proceeds to write the prescription.)
 Doctor: I’m prescribing three medicines and a syrup. The number of dots in front of each tells you how many times in the day you’ve to take them. For example, the two dots here mean you’ve to take the medicine twice in the day, once in the morning and once after dinner.
 Doctor: Do you’ve any other questions?
 Patient: No, doctor. Thank you.
 ""","""{
     "Doctor_Patient_Discussion": {
@@ -301,47 +277,88 @@ def highlight_words(input_text, json_output):
     return highlighted_text
 model = AutoModelForCausalLM.from_pretrained(
         "numind/NuExtract",
         trust_remote_code=True,
         )
-model.to("cuda")
 tokenizer = AutoTokenizer.from_pretrained("numind/NuExtract")
 tokenizer.eos = tokenizer("<|end-output|>")
 model.eval()
 def get_prediction(text,template,example):
-    print(template)
-    prompt = create_prompt(text,template,[example,"",""])
     result = generate_answer_short(prompt,model,tokenizer)
-    print(result)
     result = result.replace("\n"," ")
     r = unquote(result)
     r = json.dumps(json.loads(r),indent = 4)
-    print(result)
     dic_out = json.loads(r)
     highlighted_input2 = highlight_words(text, dic_out)
     return r,highlighted_input2
-iface = gr.Interface(fn=get_prediction,
-                     inputs=[
-                             gr.Textbox(lines=2, placeholder="Enter Text here...", label="Text"),
-                             gr.Textbox(lines=2, placeholder="Enter Template input here...", label="Template"),
-                             gr.Textbox(lines=2, placeholder="Enter Example input here...", label="Example")],
-                     outputs=[gr.Textbox(label="Model Output"),gr.HTML(label="Model Output with Highlighted Words")],
-                     examples=[[example6[0],example6[1]],
-                               [example1[0],example1[1]],
-                               [example4[0],example4[1]],
-                               [example2[0],example2[1]],
-                               [example5[0],example5[1]],
-                               [example3[0],example3[1]]])
-iface.launch(debug=True)

 example4 = ("""
 Patient: Good evening doctor.
 Doctor: Good evening. You look pale and your voice is out of tune.
 Patient: Yes doctor. I’m running a temperature and have a sore throat.
 Doctor: Lemme see.
 (He touches the forehead to feel the temperature.)
 Doctor: You’ve moderate fever.
 (He then whips out a thermometer.)
 Patient: This thermometer is very different from the one you used the last time. (Unlike the earlier one which was placed below the tongue, this one snapped around one of the fingers.)
 Doctor: Yes, this is a new introduction by medical equipment companies. It’s much more convenient, as it doesn’t require cleaning after every use.
 Patient: That’s awesome.
 Doctor: Yes it is.
 (He removes the thermometer and looks at the reading.)
 Doctor: Not too high – 99.8.
 (He then proceeds with measuring blood pressure.)
 Doctor: Your blood pressure is fine.
 (He then checks the throat.)
 Doctor: It looks bit scruffy. Not good.
 Patient: Yes, it has been quite bad.
 Doctor: Do you get sweating and shivering?
 Patient: Not sweating, but I feel somewhat cold when I sit under a fan.
 Doctor: OK. You’ve few symptoms of malaria. I would suggest you undergo blood test. Nothing to worry about. In most cases, the test come out to be negative. It’s just precautionary, as there have been spurt in malaria cases in the last month or so.
 (He then proceeds to write the prescription.)
 Doctor: I’m prescribing three medicines and a syrup. The number of dots in front of each tells you how many times in the day you’ve to take them. For example, the two dots here mean you’ve to take the medicine twice in the day, once in the morning and once after dinner.
 Doctor: Do you’ve any other questions?
 Patient: No, doctor. Thank you.
 ""","""{
     "Doctor_Patient_Discussion": {
     return highlighted_text
+# model = AutoModelForCausalLM.from_pretrained(
+#         "numind/NuExtract-tinyv2",
+#         )
 model = AutoModelForCausalLM.from_pretrained(
         "numind/NuExtract",
         trust_remote_code=True,
         )
 tokenizer = AutoTokenizer.from_pretrained("numind/NuExtract")
 tokenizer.eos = tokenizer("<|end-output|>")
+model.to("cuda")
 model.eval()
 def get_prediction(text,template,example):
+    size = len(tokenizer(text)["input_ids"])
+    print(size)
+    if size > 2000:
+        raise gr.Error("Max token for input text is 2000 tokes. Yours is: "+str(size))
+    try:
+        prompt = create_prompt(text,template,[example,"",""])
+    except:
+        raise gr.Error("Error template")
     result = generate_answer_short(prompt,model,tokenizer)
     result = result.replace("\n"," ")
     r = unquote(result)
     r = json.dumps(json.loads(r),indent = 4)
     dic_out = json.loads(r)
     highlighted_input2 = highlight_words(text, dic_out)
     return r,highlighted_input2
+markdown_description = """
+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>NuExtract</title>
+</head>
+<body>
+    <h1>NuExtract</h1>
+    <p>NuExtract is a fine-tuned version of Phi-3 small, on a private high-quality syntactic dataset of information extraction. To use the model, provide an input text (less than 2000 tokens) and a JSON schema describing the information you need to extract. This model is purely extractive, so each information output by the model is present as it is in the text. You can also provide an example of output to help the model understand your task more precisely.</p>
+    <ul>
+        <li><strong>Model</strong>: <a href="https://huggingface.co/numind/NuExtract">numind/NuExtract</a></li>
+    </ul>
+    <p>You can also find a smaller version of the model, NuExtract-tiny (0.5B) here: <a href="https://huggingface.co/numind/NuExtract-tiny">numind/NuExtract-tiny</a></p>
+    <br>
+    <br>
+    <img src="https://cdn.prod.website-files.com/638364a4e52e440048a9529c/64188f405afcf42d0b85b926_logo_numind_final.png" alt="NuMind Logo" style="vertical-align: middle;width: 200px; height: 50px;">
+    <p>We are a startup developing NuMind, a tool to create custom Information Extraction models. You can use it to create high-performance information-extraction models on your desktop.
+    <br>
+    </p>
+    <ul>
+        <li><strong>Webpage</strong>: <a href="https://www.numind.ai/">https://www.numind.ai/</a></li>
+    </ul>
+</body>
+</html>
+"""
+iface = gr.Interface(
+    fn=get_prediction,
+    inputs=[
+        gr.Textbox(lines=2, placeholder="Enter Text here...", label="Text"),
+        gr.Textbox(lines=2, placeholder="Enter Template input here...", label="Template"),
+        gr.Textbox(lines=2, placeholder="Enter Example input here...", label="Example")
+    ],
+    outputs=[gr.Textbox(label="Model Output"), gr.HTML(label="Model Output with Highlighted Words")],
+    examples=[
+        [example6[0], example6[1]],
+        [example1[0], example1[1]],
+        [example4[0], example4[1]],
+        [example2[0], example2[1]],
+        [example5[0], example5[1]],
+        [example3[0], example3[1]]
+    ],
+    description=markdown_description
+)
+iface.launch(debug=True,share=True)