Spaces:
Running
Running
add documentation
Browse files- README.md +4 -2
- streamlit_app.py +3 -2
README.md
CHANGED
@@ -12,13 +12,15 @@ license: apache-2.0
|
|
12 |
|
13 |
# DocumentIQA: Scientific Document Insight QA
|
14 |
|
|
|
|
|
15 |
## Introduction
|
16 |
|
17 |
Question/Answering on scientific documents using LLMs (OpenAI, Mistral, ~~LLama2,~~ etc..).
|
18 |
This application is the frontend for testing the RAG (Retrieval Augmented Generation) on scientific documents, that we are developing at NIMS.
|
19 |
-
Differently to most of the project, we focus on scientific articles
|
20 |
|
21 |
-
**
|
22 |
|
23 |
**Demos**:
|
24 |
- (on HuggingFace spaces): https://lfoppiano-document-qa.hf.space/
|
|
|
12 |
|
13 |
# DocumentIQA: Scientific Document Insight QA
|
14 |
|
15 |
+
**Work in progress** :construction_worker:
|
16 |
+
|
17 |
## Introduction
|
18 |
|
19 |
Question/Answering on scientific documents using LLMs (OpenAI, Mistral, ~~LLama2,~~ etc..).
|
20 |
This application is the frontend for testing the RAG (Retrieval Augmented Generation) on scientific documents, that we are developing at NIMS.
|
21 |
+
Differently to most of the project, we focus on scientific articles. We target only the full-text using [Grobid](https://github.com/kermitt2/grobid) that provide and cleaner results than the raw PDF2Text converter (which is comparable with most of other solutions).
|
22 |
|
23 |
+
**NER in LLM response**: The responses from the LLMs are post-processed to extract <span stype="color:yellow">physical quantities, measurements</span> and <span stype="color:blue">materials</span> mentions.
|
24 |
|
25 |
**Demos**:
|
26 |
- (on HuggingFace spaces): https://lfoppiano-document-qa.hf.space/
|
streamlit_app.py
CHANGED
@@ -177,6 +177,7 @@ with st.sidebar:
|
|
177 |
st.markdown(
|
178 |
"""After entering your API Key (Open AI or Huggingface). Upload a scientific article as PDF document. You will see a spinner or loading indicator while the processing is in progress. Once the spinner stops, you can proceed to ask your questions.""")
|
179 |
|
|
|
180 |
if st.session_state['git_rev'] != "unknown":
|
181 |
st.markdown("**Revision number**: [" + st.session_state[
|
182 |
'git_rev'] + "](https://github.com/lfoppiano/document-qa/commit/" + st.session_state['git_rev'] + ")")
|
@@ -231,8 +232,8 @@ if st.session_state.loaded_embeddings and question and len(question) > 0 and st.
|
|
231 |
# for entity in entities:
|
232 |
# entity
|
233 |
decorated_text = decorate_text_with_annotations(text_response.strip(), entities)
|
234 |
-
decorated_text = decorated_text.replace('class="label material"', 'style="color:
|
235 |
-
decorated_text = re.sub(r'class="label[^"]+"', 'style="color:
|
236 |
st.markdown(decorated_text, unsafe_allow_html=True)
|
237 |
text_response = decorated_text
|
238 |
else:
|
|
|
177 |
st.markdown(
|
178 |
"""After entering your API Key (Open AI or Huggingface). Upload a scientific article as PDF document. You will see a spinner or loading indicator while the processing is in progress. Once the spinner stops, you can proceed to ask your questions.""")
|
179 |
|
180 |
+
st.markdown('**NER on LLM responses**: The responses from the LLMs are post-processed to extract <span style="color:orange">physical quantities, measurements</span> and <span style="color:green">materials</span> mentions.', unsafe_allow_html=True)
|
181 |
if st.session_state['git_rev'] != "unknown":
|
182 |
st.markdown("**Revision number**: [" + st.session_state[
|
183 |
'git_rev'] + "](https://github.com/lfoppiano/document-qa/commit/" + st.session_state['git_rev'] + ")")
|
|
|
232 |
# for entity in entities:
|
233 |
# entity
|
234 |
decorated_text = decorate_text_with_annotations(text_response.strip(), entities)
|
235 |
+
decorated_text = decorated_text.replace('class="label material"', 'style="color:green"')
|
236 |
+
decorated_text = re.sub(r'class="label[^"]+"', 'style="color:orange"', decorated_text)
|
237 |
st.markdown(decorated_text, unsafe_allow_html=True)
|
238 |
text_response = decorated_text
|
239 |
else:
|