aidevhund commited on
Commit
1cce7d0
·
verified ·
1 Parent(s): 6804742

Update markdowm.py

Browse files
Files changed (1) hide show
  1. markdowm.py +27 -88
markdowm.py CHANGED
@@ -1,155 +1,94 @@
1
  description = '''
2
- # 📄 **Document QA Bot: A RAG-Based Application for Interactive Document Querying**
3
 
4
- Welcome to the Document QA Bot, a sophisticated Retrieval-Augmented Generation (RAG) application that utilizes **LlamaIndex** and **Hugging Face** models to answer questions based on documents you upload. This bot is designed to empower you with rapid, insightful responses, providing a choice of language models (LLMs) and embedding models that cater to various requirements, including performance, accuracy, and response time.
5
 
6
  ## ✨ **Application Overview**
7
- With Document QA Bot, you can interactively query your document, receive contextual answers, and dynamically switch between LLMs as needed for optimal results. The bot supports various file formats, allowing you to upload and analyze different types of documents and even some image formats.
8
 
9
  ### **Key Features**
10
  - **Choice of Models:** Access a list of powerful LLMs and embedding models for optimal results.
11
- - **Flexible Document Support:** Multiple file types supported, including images.
12
- - **Real-time Interaction:** Easily switch between models for experimentation and fine-tuning answers.
13
- - **User-Friendly:** Seamless experience powered by Gradio's intuitive interface.
14
 
15
  ---
16
 
17
- ## 🚀 **Steps to Use the Document QA Bot**
18
 
19
  1. **Upload Your File**
20
- Begin by uploading a document. Supported formats include `.pdf`, `.docx`, `.txt`, `.csv`, `.xlsx`, `.pptx`, `.html`, `.jpg`, `.png`, and more.
21
 
22
  2. **Select Embedding Model**
23
  Choose an embedding model to parse and index the document’s contents, then submit. Wait for the confirmation message that the document has been successfully indexed.
24
-
25
- 3. **Choose a Language Model (LLM)**
26
- Pick an LLM from the dropdown to tailor the bot’s response style and accuracy.
27
-
28
- 4. **Start Chatting**
29
- Ask questions about your document! You can switch between LLMs as needed for different insights or to test model behavior on the same question.
30
-
31
- ---
32
-
33
- ## ⚙️ **How the Application Works**
34
-
35
- Upon uploading a document, the bot utilizes **LlamaParse** to parse its content. The parsed data is then indexed with a selected embedding model, generating a vector representation that enables quick and relevant responses. When you ask questions, the chosen LLM interprets the document context to generate responses specific to the content uploaded.
36
-
37
- ---
38
-
39
  ## 🔍 **Available LLMs and Embedding Models**
40
 
41
  ### **Embedding Models** (For indexing document content)
42
- 1. **`BAAI/bge-large-en`**
43
  - **Size**: 335M parameters
44
  - **Best For**: Complex, detailed embeddings; slower but yields high accuracy.
45
- 2. **`BAAI/bge-small-en-v1.5`**
46
  - **Size**: 33.4M parameters
47
  - **Best For**: Faster embeddings, ideal for lighter workloads and quick responses.
48
- 3. **`NeuML/pubmedbert-base-embeddings`**
49
  - **Size**: 768-dimensional dense vector space
50
  - **Best For**: Biomedical or medical-related text; highly specialized.
51
- 4. **`BAAI/llm-embedder`**
52
  - **Size**: 109M parameters
53
  - **Best For**: Basic embeddings for straightforward use cases.
54
 
55
  ### **LLMs** (For generating answers)
56
- 1. **`mistralai/Mixtral-8x7B-Instruct-v0.1`**
57
  - **Size**: 46.7B parameters
58
  - **Purpose**: Demonstrates compelling performance with minimal fine-tuning. Suited for unmoderated or exploratory use.
59
- 2. **`meta-llama/Meta-Llama-3-8B-Instruct`**
60
  - **Size**: 8.03B parameters
61
  - **Purpose**: Optimized for dialogue, emphasizing safety and helpfulness. Excellent for structured, instructive responses.
62
- 3. **`mistralai/Mistral-7B-Instruct-v0.2`**
63
  - **Size**: 7.24B parameters
64
  - **Purpose**: Fine-tuned for effectiveness; lacks moderation, useful for quick demonstration purposes.
65
- 4. **`tiiuae/falcon-7b-instruct`**
66
  - **Size**: 7.22B parameters
67
- - **Purpose**: Robust open-source model for inference, leveraging large-scale data for highly contextual responses.
68
 
69
  ---
70
 
71
- ## 🔗 **Best Embedding Model Combinations for Optimal Performance in RAG**
72
-
73
- The choice of embedding models plays a crucial role in determining the speed and accuracy of document responses. Since you can dynamically switch LLMs during the chat, focusing on an optimal embedding model at the outset will significantly influence response quality and efficiency. Below is a guide to the best embedding models for various scenarios based on the need for time efficiency and answer accuracy.
74
 
75
  | **Scenario** | **Embedding Model** | **Strengths** | **Trade-Offs** |
76
  |:-----------------------------:|:------------------------------------:|:--------------------------------------------------:|:------------------------------------:|
77
- | **Fastest Response** | `BAAI/bge-small-en-v1.5` | Speed-oriented, ideal for high-frequency querying | May miss nuanced details |
78
- | **High Accuracy for Large Texts** | `BAAI/bge-large-en` | High accuracy, captures complex document structure | Slower response time |
79
- | **Balanced General Purpose** | `BAAI/llm-embedder` | Reliable, quick response, adaptable across topics | Moderate accuracy, general use case |
80
- | **Biomedical & Specialized Text** | `NeuML/pubmedbert-base-embeddings` | Optimized for medical and scientific text | Specialized, slightly slower |
81
 
82
  ---
83
 
84
  ## 📂 **Supported File Formats**
85
 
86
  The bot supports a range of document formats, making it versatile for various data sources. Below are the currently supported formats:
87
- - **Documents**: `.pdf`, `.docx`, `.doc`, `.txt`, `.csv`, `.xlsx`, `.pptx`, `.html`
88
- - **Images**: `.jpg`, `.jpeg`, `.png`, `.webp`, `.svg`
89
 
90
  ---
91
 
92
- ## 🎯 **Use Cases**
93
-
94
- 1. **Educational Research**
95
- Upload research papers or study materials and get precise answers for revision or note-taking.
96
-
97
- 2. **Corporate Data Analysis**
98
- Interrogate reports, presentations, or financial data for quick insights without reading extensive documents.
99
-
100
- 3. **Legal Document Analysis**
101
- Analyze lengthy legal documents by querying clauses, terms, and specific details.
102
-
103
- 4. **Healthcare and Scientific Research**
104
- Access detailed insights into medical or scientific documents with models trained on domain-specific data.
105
-
106
  ---
107
 
108
  ### 🌟 **Get Started Today and Experience Document-Centric Question Answering**
109
- Whether you're a student, researcher, or professional, the Document QA Bot is your go-to tool for interactive, accurate document analysis. Upload your file, select your model, and dive into a seamless question-answering experience tailored to your document's unique content.
110
  '''
111
 
112
  guide = '''
113
- ### Embedding Models and Trade-Offs
114
 
115
  | **Embedding Model** | **Speed (Vector Index)** | **Advantages** | **Trade-Offs** |
116
  |-----------------------------|-------------------|-------------------------------------|---------------------------------|
117
- | `BAAI/bge-small-en-v1.5` | **Fastest** | Ideal for quick indexing | May miss nuanced details |
118
- | `BAAI/llm-embedder` | **Fast** | Balanced performance and detail | Slightly less precise than large models |
119
- | `BAAI/bge-large-en` | **Slow** | Best overall precision and detail | Slower due to complexity |
120
 
121
 
122
  ### Language Models (LLMs) and Use Cases
123
 
124
  | **LLM** | **Best Use Case** |
125
  |------------------------------------|-----------------------------------------|
126
- | `mistralai/Mixtral-8x7B-Instruct-v0.1` | Works well for **both short and long answers** |
127
- | `meta-llama/Meta-Llama-3-8B-Instruct` | Ideal for **long-length answers** |
128
- | `tiiuae/falcon-7b-instruct` | Best suited for **short-length answers** |
129
 
130
  '''
131
-
132
- footer = """
133
- <div style="background-color: #1d2938; color: white; padding: 10px; width: 100%; bottom: 0; left: 0; display: flex; justify-content: space-between; align-items: center; padding: .2rem 35px; box-sizing: border-box; font-size: 16px;">
134
- <div style="text-align: left;">
135
- <p style="margin: 0;">&copy; 2024 </p>
136
- </div>
137
- <div style="text-align: center; flex-grow: 1;">
138
- <p style="margin: 0;"> This website is made with ❤ by SARATH CHANDRA</p>
139
- </div>
140
- <div class="social-links" style="display: flex; gap: 20px; justify-content: flex-end; align-items: center;">
141
- <a href="https://github.com/21bq1a4210" target="_blank" style="text-align: center;">
142
- <img src="data:image/png;base64,{}" alt="GitHub" width="40" height="40" style="display: block; margin: 0 auto;">
143
- <span style="font-size: 14px;">GitHub</span>
144
- </a>
145
- <a href="https://www.linkedin.com/in/sarath-chandra-bandreddi-07393b1aa/" target="_blank" style="text-align: center;">
146
- <img src="data:image/png;base64,{}" alt="LinkedIn" width="40" height="40" style="display: block; margin: 0 auto;">
147
- <span style="font-size: 14px;">LinkedIn</span>
148
- </a>
149
- <a href="https://21bq1a4210.github.io/MyPortfolio-/" target="_blank" style="text-align: center;">
150
- <img src="data:image/png;base64,{}" alt="Portfolio" width="40" height="40" style="display: block; margin-right: 40px;">
151
- <span style="font-size: 14px;">Portfolio</span>
152
- </a>
153
- </div>
154
- </div>
155
- """
 
1
  description = '''
2
+ # 📄 **QueryVault Chatbots: A RAG-Based chatbots for Interactive Document Querying**
3
 
4
+ Welcome to the HundAI QueryVault Chatbot, a sophisticated Retrieval-Augmented Generation (RAG) application that utilizes Large Language Models to answer questions based on documents you upload. This bot is designed to empower you with rapid, insightful responses, providing a choice of language models (LLMs) and embedding models that cater to various requirements, including performance, accuracy, and response time.
5
 
6
  ## ✨ **Application Overview**
7
+ With QueryVault Chatbot, you can interactively query your document, receive contextual answers, and dynamically switch between LLMs as needed for optimal results. The bot supports various file formats, allowing you to upload and analyze different types of documents and even some image formats.
8
 
9
  ### **Key Features**
10
  - **Choice of Models:** Access a list of powerful LLMs and embedding models for optimal results.
 
 
 
11
 
12
  ---
13
 
14
+ ## 🚀 **Steps to Use the HundAI QueryVault Chatbot**
15
 
16
  1. **Upload Your File**
17
+ Begin by uploading a document. Supported formats include .pdf, .docx, .txt, .csv, .xlsx, .pptx, .html, .jpg, .png, and more.
18
 
19
  2. **Select Embedding Model**
20
  Choose an embedding model to parse and index the document’s contents, then submit. Wait for the confirmation message that the document has been successfully indexed.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  ## 🔍 **Available LLMs and Embedding Models**
22
 
23
  ### **Embedding Models** (For indexing document content)
24
+ 1. **BAAI/bge-large-en**
25
  - **Size**: 335M parameters
26
  - **Best For**: Complex, detailed embeddings; slower but yields high accuracy.
27
+ 2. **BAAI/bge-small-en-v1.5**
28
  - **Size**: 33.4M parameters
29
  - **Best For**: Faster embeddings, ideal for lighter workloads and quick responses.
30
+ 3. **NeuML/pubmedbert-base-embeddings**
31
  - **Size**: 768-dimensional dense vector space
32
  - **Best For**: Biomedical or medical-related text; highly specialized.
33
+ 4. **BAAI/llm-embedder**
34
  - **Size**: 109M parameters
35
  - **Best For**: Basic embeddings for straightforward use cases.
36
 
37
  ### **LLMs** (For generating answers)
38
+ 1. **Mixtral-8x7B-Instruct**
39
  - **Size**: 46.7B parameters
40
  - **Purpose**: Demonstrates compelling performance with minimal fine-tuning. Suited for unmoderated or exploratory use.
41
+ 2. **Meta-Llama-3-8B-Instruct**
42
  - **Size**: 8.03B parameters
43
  - **Purpose**: Optimized for dialogue, emphasizing safety and helpfulness. Excellent for structured, instructive responses.
44
+ 3. **Mistral-7B**
45
  - **Size**: 7.24B parameters
46
  - **Purpose**: Fine-tuned for effectiveness; lacks moderation, useful for quick demonstration purposes.
47
+ 4. **HundAI-7B-S**
48
  - **Size**: 7.22B parameters
49
+ - **Purpose**: Robust fine-tuned model for inference, leveraging large-scale data for highly contextual responses.
50
 
51
  ---
52
 
 
 
 
53
 
54
  | **Scenario** | **Embedding Model** | **Strengths** | **Trade-Offs** |
55
  |:-----------------------------:|:------------------------------------:|:--------------------------------------------------:|:------------------------------------:|
56
+ | **Fastest Response** | BAAI/bge-small-en-v1.5 | Speed-oriented, ideal for high-frequency querying | May miss nuanced details |
57
+ | **High Accuracy for Large Texts** | BAAI/bge-large-en | High accuracy, captures complex document structure | Slower response time |
58
+ | **Balanced General Purpose** | BAAI/llm-embedder | Reliable, quick response, adaptable across topics | Moderate accuracy, general use case |
59
+ | **Biomedical & Specialized Text** | NeuML/pubmedbert-base-embeddings | Optimized for medical and scientific text | Specialized, slightly slower |
60
 
61
  ---
62
 
63
  ## 📂 **Supported File Formats**
64
 
65
  The bot supports a range of document formats, making it versatile for various data sources. Below are the currently supported formats:
66
+ - **Documents**: .pdf, .docx, .doc, .txt, .csv, .xlsx, .pptx, .html
67
+ - **Images**: .jpg, .jpeg, .png, .webp, .svg
68
 
69
  ---
70
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71
  ---
72
 
73
  ### 🌟 **Get Started Today and Experience Document-Centric Question Answering**
74
+ Whether you're a student, researcher, or professional, HundAI QueryVault Chatbot is your go-to tool for interactive, accurate document analysis. Upload your file, select your model, and dive into a seamless question-answering experience tailored to your document's unique content.
75
  '''
76
 
77
  guide = '''
 
78
 
79
  | **Embedding Model** | **Speed (Vector Index)** | **Advantages** | **Trade-Offs** |
80
  |-----------------------------|-------------------|-------------------------------------|---------------------------------|
81
+ | BAAI/bge-small-en-v1.5 | **Fastest** | Ideal for quick indexing | May miss nuanced details |
82
+ | BAAI/llm-embedder | **Fast** | Balanced performance and detail | Slightly less precise than large models |
83
+ | BAAI/bge-large-en | **Slow** | Best overall precision and detail | Slower due to complexity |
84
 
85
 
86
  ### Language Models (LLMs) and Use Cases
87
 
88
  | **LLM** | **Best Use Case** |
89
  |------------------------------------|-----------------------------------------|
90
+ | Mixtral-8x7B-Instruct-v0.1 | Works well for **both short and long answers** |
91
+ | Meta-Llama-3-8B-Instruct | Ideal for **long-length answers** |
92
+ | HundAI-7B-S | Best suited for **short-length answers** |
93
 
94
  '''