Spaces:

MachineLearningReply
/

q-and-a-tool

Sleeping

App Files Files Community

Henryk Borzymowski commited on Oct 16, 2023

Commit

8329090

0 Parent(s):

initial push

Browse files

Files changed (10) hide show

.DS_Store +0 -0
README.md +111 -0
app.py +85 -0
requirements.txt +7 -0
utils/__pycache__/config.cpython-310.pyc +0 -0
utils/__pycache__/haystack.cpython-310.pyc +0 -0
utils/__pycache__/ui.cpython-310.pyc +0 -0
utils/config.py +43 -0
utils/haystack.py +84 -0
utils/ui.py +12 -0

.DS_Store ADDED Viewed

Binary file (6.15 kB). View file

README.md ADDED Viewed

	@@ -0,0 +1,111 @@

+---
+title: Haystack Search Pipeline with Streamlit
+emoji: 👑
+colorFrom: indigo
+colorTo: indigo
+sdk: streamlit
+sdk_version: 1.23.0
+app_file: app.py
+pinned: false
+---
+# Template Streamlit App for Haystack Search Pipelines
+This template [Streamlit](https://docs.streamlit.io/) app set up for simple [Haystack search applications](https://docs.haystack.deepset.ai/docs/semantic_search). The template is ready to do QA with **Retrievel Augmented Generation**, or **Ectractive QA**
+See the ['How to use this template'](#how-to-use-this-template) instructions below to create a simple UI for your own Haystack search pipelines.
+Below you will also find instructions on how you could [push this to Hugging Face Spaces 🤗](#pushing-to-hugging-face-spaces-).
+## Installation and Running
+To run the bare application which does _nothing_:
+1. Install requirements: `pip install -r requirements.txt`
+2. Run the streamlit app: `streamlit run app.py`
+This will start up the app on `localhost:8501` where you will find a simple search bar. Before you start editing, you'll notice that the app will only show you instructions on what to edit.
+### Optional Configurations
+You can set optional cofigurations to set the:
+-  `--task` you want to start the app with: `rag` or `extractive` (default: rag)
+-  `--store` you want to use: `inmemory`, `opensearch`, `weaviate` or `milvus` (default: inmemory)
+-  `--name` you want to have for the app. (default: 'My Search App')
+E.g.:
+```bash
+streamlit run app.py -- --store opensearch --task extractive --name 'My Opensearch Documentation Search'
+```
+In a `.env` file, include all the config settings that you would like to use based on:
+- The DocumentStore of your choice
+- The Extractive/Generative model of your choice
+While the `/utils/config.py` will create default values for some configurations, others have to be set in the `.env` such as the `OPENAI_KEY`
+Example `.env`
+```
+OPENAI_KEY=YOUR_KEY
+EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L12-v2
+GENERATIVE_MODEL=text-davinci-003
+```
+## How to use this template
+1. Create a new repository from this template or simply open it in a codespace to start playing around 💙
+2. Make sure your `requirements.txt` file includes the Haystack and Streamlit versions you would like to use.
+3. Change the code in `utils/haystack.py` if you would like a different pipeline.
+4. Create a `.env`file with all of your configuration settings.
+5. Make any UI edits you'd like to and [share with the Haystack community](https://haystack.deepeset.ai/community)
+6. Run the app as show in [installation and running](#installation-and-running)
+### Repo structure
+- `./utils`: This is where we have 3 files:
+    - `config.py`: This file extracts all of the configuration settings from a `.env` file. For some config settings, it uses default values. An example of this is in [this demo project](https://github.com/TuanaCelik/should-i-follow/blob/main/utils/config.py).
+    - `haystack.py`: Here you will find some functions already set up for you to start creating your Haystack search pipeline. It includes 2 main functions called `start_haystack()` which is what we use to create a pipeline and cache it, and `query()` which is the function called by `app.py` once a user query is received.
+    - `ui.py`: Use this file for any UI and initial value setups.
+- `app.py`: This is the main Streamlit application file that we will run. In its current state it has a simple search bar, a 'Run' button, and a response that you can highlight answers with.
+### What to edit?
+There are default pipelines both in `start_haystack_extractive()` and `start_haystack_rag()`
+- Change the pipelines to use the embedding models, extractive or generative models as you need.
+- If using the `rag` task, change the `default_prompt_template` to use one of our available ones on [PromptHub](https://prompthub.deepset.ai) or create your own `PromptTemplate`
+## Pushing to Hugging Face Spaces 🤗
+Below is an example GitHub action that will let you push your Streamlit app straight to the Hugging Face Hub as a Space.
+A few things to pay attention to:
+1. Create a New Space on Hugging Face with the Streamlit SDK.
+2. Create a Hugging Face token on your HF account.
+3. Create a secret on your GitHub repo called `HF_TOKEN` and put your Hugging Face token here.
+4. If you're using DocumentStores or APIs that require some keys/tokens, make sure these are provided as a secret for your HF Space too!
+5. This readme is set up to tell HF spaces that it's using streamlit and that the app is running on `app.py`, make any changes to the frontmatter of this readme to display the title, emoji etc you desire.
+6. Create a file in `.github/workflows/hf_sync.yml`. Here's an example that you can change with your own information, and an [example workflow](https://github.com/TuanaCelik/should-i-follow/blob/main/.github/workflows/hf_sync.yml) working for the [Should I Follow demo](https://huggingface.co/spaces/deepset/should-i-follow)
+```yaml
+name: Sync to Hugging Face hub
+on:
+  push:
+    branches: [main]
+  # to run this workflow manually from the Actions tab
+  workflow_dispatch:
+jobs:
+  sync-to-hub:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v2
+        with:
+          fetch-depth: 0
+          lfs: true
+      - name: Push to hub
+        env:
+          HF_TOKEN: ${{ secrets.HF_TOKEN }}
+        run: git push --force https://{YOUR_HF_USERNAME}:$HF_TOKEN@{YOUR_HF_SPACE_REPO} main
+```

app.py ADDED Viewed

	@@ -0,0 +1,85 @@

+import streamlit as st
+import logging
+import os
+from annotated_text import annotation
+from json import JSONDecodeError
+from markdown import markdown
+from utils.config import parser
+from utils.haystack import start_document_store, start_haystack_extractive, start_haystack_rag, query
+from utils.ui import reset_results, set_initial_state
+try:
+    args = parser.parse_args()
+    document_store = start_document_store(type = args.store)
+    if args.task == 'extractive':
+        pipeline = start_haystack_extractive(document_store)
+    else:
+        pipeline = start_haystack_rag(document_store)
+    set_initial_state()
+    st.write('# '+args.name)
+    # Search bar
+    question = st.text_input("Ask a question", value=st.session_state.question, max_chars=100, on_change=reset_results)
+    #question = "what is Pi?"
+    run_pressed = st.button("Run")
+    #run_pressed = True
+    run_query = (
+        run_pressed or question != st.session_state.question
+    )
+    # Get results for query
+    if run_query and question:
+        reset_results()
+        st.session_state.question = question
+        with st.spinner("🔎 &nbsp;&nbsp; Running your pipeline"):
+            try:
+                st.session_state.results = query(pipeline, question)
+            except JSONDecodeError as je:
+                st.error(
+                    "👓 &nbsp;&nbsp; An error occurred reading the results. Is the document store working?"
+                )
+            except Exception as e:
+                logging.exception(e)
+                st.error("🐞 &nbsp;&nbsp; An error occurred during the request.")
+    if st.session_state.results:
+        results = st.session_state.results
+        if args.task == 'extractive':
+            answers = results['answers']
+            for count, answer in enumerate(answers):
+                if answer.answer:
+                    text, context = answer.answer, answer.context
+                    start_idx = context.find(text)
+                    end_idx = start_idx + len(text)
+                    st.write(
+                        f" Answer: {markdown(context[:start_idx] + str(annotation(body=text, label='ANSWER', background='#964448', color='#ffffff')) + context[end_idx:])}",
+                        unsafe_allow_html=True,
+                    )
+                else:
+                    st.info(
+                        "🤔 &nbsp;&nbsp; Haystack is unsure whether any of the documents contain an answer to your question. Try to reformulate it!"
+                    )
+        elif args.task == 'rag':
+            st.write(f" Answer: {results['results'][0]}")
+                # Extract and display information from the 'documents' list
+        retrieved_documents = results['documents']
+        st.subheader("Retriever Results:")
+        for document in retrieved_documents:
+            st.write(f"Document Name: {document.meta['name']}")
+            st.write(f"Score: {document.score}")
+            st.write(f"Text: {document.content}")
+except SystemExit as e:
+    # This exception will be raised if --help or invalid command line arguments
+    # are used. Currently streamlit prevents the program from exiting normally
+    # so we have to do a hard exit.
+    os._exit(e.code)

requirements.txt ADDED Viewed

	@@ -0,0 +1,7 @@

+safetensors==0.3.3.post1
+farm-haystack[inference,weaviate,opensearch]==1.20.0
+milvus-haystack
+streamlit==1.23.0
+markdown
+st-annotated-text
+datasets

utils/__pycache__/config.cpython-310.pyc ADDED Viewed

Binary file (1.61 kB). View file

utils/__pycache__/haystack.cpython-310.pyc ADDED Viewed

Binary file (2.88 kB). View file

utils/__pycache__/ui.cpython-310.pyc ADDED Viewed

Binary file (678 Bytes). View file

utils/config.py ADDED Viewed

	@@ -0,0 +1,43 @@

+import argparse
+import os
+import os
+from dotenv import load_dotenv
+load_dotenv()
+parser = argparse.ArgumentParser(description='This app lists animals')
+document_store_choices = ('inmemory', 'weaviate', 'milvus', 'opensearch')
+task_choices = ('extractive', 'rag')
+parser.add_argument('--store', choices=document_store_choices, default='inmemory', help='DocumentStore selection (default: %(default)s)')
+parser.add_argument('--task', choices=task_choices, default='rag', help='Task selection (default: %(default)s)')
+parser.add_argument('--name', default="My Search App")
+model_configs = {
+    'EMBEDDING_MODEL': os.getenv("EMBEDDING_MODEL", "sentence-transformers/all-MiniLM-L12-v2"),
+    'GENERATIVE_MODEL': os.getenv("GENERATIVE_MODEL", "gpt-4"),
+    'EXTRACTIVE_MODEL': os.getenv("EXTRACTIVE_MODEL", "deepset/roberta-base-squad2"),
+    'OPENAI_KEY': os.getenv("OPENAI_KEY"),
+    'COHERE_KEY': os.getenv("COHERE_KEY"),
+}
+document_store_configs = {
+# Weaviate Config
+'WEAVIATE_HOST':  os.getenv("WEAVIATE_HOST", "http://localhost"),
+'WEAVIATE_PORT': os.getenv("WEAVIATE_PORT", 8080),
+'WEAVIATE_INDEX': os.getenv("WEAVIATE_INDEX", "Document"),
+'WEAVIATE_EMBEDDING_DIM': os.getenv("WEAVIATE_EMBEDDING_DIM", 768),
+# OpenSearch Config
+'OPENSEARCH_SCHEME': os.getenv("OPENSEARCH_SCHEME",  "https"),
+'OPENSEARCH_USERNAME': os.getenv("OPENSEARCH_USERNAME", "admin"),
+'OPENSEARCH_PASSWORD': os.getenv("OPENSEARCH_PASSWORD", "admin"),
+'OPENSEARCH_HOST': os.getenv("OPENSEARCH_HOST", "localhost"),
+'OPENSEARCH_PORT': os.getenv("OPENSEARCH_PORT", 9200),
+'OPENSEARCH_INDEX':  os.getenv("OPENSEARCH_INDEX", "document"),
+'OPENSEARCH_EMBEDDING_DIM': os.getenv("OPENSEARCH_EMBEDDING_DIM", 768),
+# Milvus Config
+'MILVUS_URI': os.getenv("MILVUS_URI", "http://localhost:19530/default"),
+'MILVUS_INDEX':  os.getenv("MILVUS_INDEX", "document"),
+'MILVUS_EMBEDDING_DIM': os.getenv("MILVUS_EMBEDDING_DIM", 768),
+}

utils/haystack.py ADDED Viewed

	@@ -0,0 +1,84 @@

+import streamlit as st
+from utils.config import document_store_configs, model_configs
+from haystack import Pipeline
+from haystack.schema import Answer
+from haystack.document_stores import BaseDocumentStore
+from haystack.document_stores import InMemoryDocumentStore, OpenSearchDocumentStore, WeaviateDocumentStore
+from haystack.nodes import EmbeddingRetriever, FARMReader, PromptNode
+from milvus_haystack import MilvusDocumentStore
+#Use this file to set up your Haystack pipeline and querying
+@st.cache_resource(show_spinner=False)
+def start_document_store(type: str):
+    #This function starts the documents store of your choice based on your command line preference
+    if type == 'inmemory':
+        document_store = InMemoryDocumentStore(use_bm25=True, embedding_dim=384)
+        documents = [
+            {
+                'content': "Pi is a super dog",
+                'meta': {'name': "pi.txt"}
+            },
+            {
+                'content': "The revenue of siemens is 5 milion Euro",
+                'meta': {'name': "siemens.txt"}
+            },
+        ]
+        document_store.write_documents(documents)
+    elif type == 'opensearch':
+        document_store = OpenSearchDocumentStore(scheme = document_store_configs['OPENSEARCH_SCHEME'],
+                                                 username = document_store_configs['OPENSEARCH_USERNAME'],
+                                                 password = document_store_configs['OPENSEARCH_PASSWORD'],
+                                                 host = document_store_configs['OPENSEARCH_HOST'],
+                                                 port = document_store_configs['OPENSEARCH_PORT'],
+                                                 index = document_store_configs['OPENSEARCH_INDEX'],
+                                                 embedding_dim = document_store_configs['OPENSEARCH_EMBEDDING_DIM'])
+    elif type == 'weaviate':
+        document_store = WeaviateDocumentStore(host = document_store_configs['WEAVIATE_HOST'],
+                                                port = document_store_configs['WEAVIATE_PORT'],
+                                                index = document_store_configs['WEAVIATE_INDEX'],
+                                                embedding_dim = document_store_configs['WEAVIATE_EMBEDDING_DIM'])
+    elif type == 'milvus':
+        document_store = MilvusDocumentStore(uri = document_store_configs['MILVUS_URI'],
+                                            index = document_store_configs['MILVUS_INDEX'],
+                                            embedding_dim = document_store_configs['MILVUS_EMBEDDING_DIM'],
+                                            return_embedding=True)
+    return document_store
+# cached to make index and models load only at start
+@st.cache_resource(show_spinner=False)
+def start_haystack_extractive(_document_store: BaseDocumentStore):
+    retriever = EmbeddingRetriever(document_store=_document_store,
+                                   embedding_model=model_configs['EMBEDDING_MODEL'],
+                                   top_k=5)
+    _document_store.update_embeddings(retriever)
+    reader = FARMReader(model_name_or_path=model_configs['EXTRACTIVE_MODEL'])
+    pipe = Pipeline()
+    pipe.add_node(component=retriever, name="Retriever", inputs=["Query"])
+    pipe.add_node(component=reader, name="Reader", inputs=["Retriever"])
+    return pipe
+@st.cache_resource(show_spinner=False)
+def start_haystack_rag(_document_store: BaseDocumentStore):
+    retriever = EmbeddingRetriever(document_store=_document_store,
+                                   embedding_model=model_configs['EMBEDDING_MODEL'],
+                                   top_k=5)
+    _document_store.update_embeddings(retriever)
+    prompt_node = PromptNode(default_prompt_template="deepset/question-answering",
+                             model_name_or_path=model_configs['GENERATIVE_MODEL'],
+                             api_key=model_configs['OPENAI_KEY'])
+    pipe = Pipeline()
+    pipe.add_node(component=retriever, name="Retriever", inputs=["Query"])
+    pipe.add_node(component=prompt_node, name="PromptNode", inputs=["Retriever"])
+    return pipe
+@st.cache_data(show_spinner=True)
+def query(_pipeline, question):
+    params = {}
+    results = _pipeline.run(question, params=params)
+    return results

utils/ui.py ADDED Viewed

	@@ -0,0 +1,12 @@

+import streamlit as st
+def set_state_if_absent(key, value):
+    if key not in st.session_state:
+        st.session_state[key] = value
+def set_initial_state():
+    set_state_if_absent("question", "Ask something here?")
+    set_state_if_absent("results", None)
+def reset_results(*args):
+    st.session_state.results = None