Spaces:

chagu13
/

chagu-demo

Running

App Files Files Community

talexm commited on Nov 30, 2024

Commit

eb579c5

1 Parent(s): e4a2031

update

Browse files

Files changed (1) hide show

rag_sec/README.md +42 -36

rag_sec/README.md CHANGED Viewed

@@ -1,51 +1,57 @@
-# Document Search and Response Generation System
-This project implements a **Document Search and Response Generation System** combining semantic search, malicious query detection, and generative response capabilities. It is designed for efficient and context-aware information retrieval and response generation.
----
-## Features
-1. **Semantic Search**:
-   - Uses SentenceTransformer embeddings for document similarity.
-   - Retrieves top-k relevant documents for a given query.
-2. **Malicious Query Detection**:
-   - Identifies and blocks malicious or harmful queries using sentiment analysis.
-3. **Query Transformation**:
-   - Rephrases or enhances ambiguous queries for better processing.
-4. **Generative Response**:
-   - Generates a context-aware response using Hugging Face models like `distilgpt2`.
-5. **Expandable Architecture**:
-   - Modular components for easy enhancement and integration.
-   - Compatible with lightweight and resource-efficient models.
 ---
-## Architecture
-1. **Bad Query Detector**:
-   - Detects malicious or inappropriate queries using sentiment analysis (`distilbert-base-uncased-finetuned-sst-2-english`).
-2. **Query Transformer**:
-   - Rephrases or improves queries for better retrieval results.
-3. **Document Retriever**:
-   - Encodes documents into dense vectors using `all-MiniLM-L6-v2` embeddings.
-   - Finds similar documents using cosine similarity.
-4. **Semantic Response Generator**:
-   - Generates context-aware responses using models like `distilgpt2` or `EleutherAI/gpt-neo-1.3B`.
 ---
-## Requirements
-### Python Libraries
-Install the necessary libraries using `pip`:
-```bash
-pip install transformers sentence-transformers scikit-learn numpy flask
-```

+## Workflow
+The system follows a well-structured workflow to ensure accurate, secure, and context-aware responses to user queries:
+### 1. **Input Query**
+- A user provides a query that can be a general question, ambiguous statement, or potentially malicious intent.
+---
+### 2. **Detection Module**
+- **Purpose**: Classify the query as "bad" or "good."
+- **Steps**:
+  1. Use a sentiment analysis model (`distilbert-base-uncased-finetuned-sst-2-english`) to detect malicious or inappropriate intent.
+  2. If the query is classified as "bad" (e.g., SQL injection or inappropriate tone), block further processing and provide a warning message.
+  3. If "good," proceed to the **Transformation Module**.
 ---
+### 3. **Transformation Module**
+- **Purpose**: Rephrase or enhance ambiguous or poorly structured queries for better retrieval.
+- **Steps**:
+  1. Identify missing context or ambiguous phrasing.
+  2. Transform the query using:
+     - Rule-based transformations for simple fixes.
+     - Text-to-text models (e.g., `google/flan-t5-small`) for more sophisticated rephrasing.
+  3. Pass the transformed query to the **RAG Pipeline**.
+---
+### 4. **RAG Pipeline**
+- **Purpose**: Retrieve relevant data and generate a context-aware response.
+- **Steps**:
+  1. **Document Retrieval**:
+     - Encode the transformed query and documents into embeddings using `all-MiniLM-L6-v2`.
+     - Compute semantic similarity between the query and stored documents.
+     - Retrieve the top-k documents relevant to the query.
+  2. **Response Generation**:
+     - Use the retrieved documents as context.
+     - Pass the query and context to a generative model (e.g., `distilgpt2`) to synthesize a meaningful response.
+---
+### 5. **Semantic Response Generation**
+- **Purpose**: Provide a concise and meaningful answer.
+- **Steps**:
+  1. Combine the retrieved documents into a coherent context.
+  2. Generate a response tailored to the query using the generative model.
+  3. Return the response to the user, ensuring clarity and relevance.
 ---
+### End-to-End Example
+#### Input Query:
+```plaintext
+"How to improve acting skills?"
+````