Spaces:
Sleeping
Sleeping
Lauredecaudin
commited on
Commit
β’
8a81693
1
Parent(s):
602bf86
Create 6- π BONUS : Build your own chatbot.py
Browse files
pages/6- π BONUS : Build your own chatbot.py
ADDED
@@ -0,0 +1,150 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import streamlit as st
|
2 |
+
import tempfile
|
3 |
+
|
4 |
+
# Title and introduction for the tutorial
|
5 |
+
st.title("π Chatbot Implementation Tutorial")
|
6 |
+
st.write("""
|
7 |
+
### π How to Build Your Own CV Q&A Chatbot
|
8 |
+
|
9 |
+
In this tutorial, we'll guide you through building a chatbot that can answer questions about your CV using **LangChain** and **Hugging Face** models.
|
10 |
+
This bot will help you interact with your resume in a conversational manner.
|
11 |
+
|
12 |
+
You can:
|
13 |
+
1. π€ Upload your CV, and the bot will answer any questions about it.
|
14 |
+
2. π Use this code to deploy your own chatbot and embed it on your website or deploy it on [Streamlit](https://streamlit.io) or [Hugging Face Spaces](https://huggingface.co/spaces).
|
15 |
+
|
16 |
+
---
|
17 |
+
|
18 |
+
### π Step 1: Install the Required Libraries
|
19 |
+
|
20 |
+
Make sure you have the following libraries installed:
|
21 |
+
|
22 |
+
```bash
|
23 |
+
pip install streamlit langchain chromadb transformers huggingface_hub
|
24 |
+
```
|
25 |
+
|
26 |
+
|
27 |
+
### π Step 2: Structure of the Chatbot
|
28 |
+
The chatbot performs the following actions:
|
29 |
+
|
30 |
+
- **Loads the PDF**: Upload your CV, which is processed and split into manageable chunks.
|
31 |
+
- **Creates a Vector Database**: The document chunks are embedded into vectors and stored in a vector database using Chroma.
|
32 |
+
- **Uses an LLM for Responses**: We use a language model from Hugging Face to provide conversational answers based on your CV content.
|
33 |
+
- **Manages Conversation**: The bot maintains conversation history and uses it to generate more accurate responses.
|
34 |
+
|
35 |
+
### π Step 3: Code Breakdown
|
36 |
+
**Loading and Splitting the PDF Document**
|
37 |
+
|
38 |
+
We first define a function that loads your PDF CV and splits it into smaller chunks.
|
39 |
+
```python
|
40 |
+
def load_doc(list_file_path, chunk_size=600, chunk_overlap=50):
|
41 |
+
loaders = [PyPDFLoader(x) for x in list_file_path]
|
42 |
+
pages = []
|
43 |
+
for loader in loaders:
|
44 |
+
pages.extend(loader.load())
|
45 |
+
text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
|
46 |
+
doc_splits = text_splitter.split_documents(pages)
|
47 |
+
return doc_splits
|
48 |
+
```
|
49 |
+
|
50 |
+
**Creating the Vector Database**
|
51 |
+
|
52 |
+
Next, we create a vector store using the Hugging Face embeddings and Chroma.
|
53 |
+
```python
|
54 |
+
def create_db(splits, collection_name):
|
55 |
+
embedding = HuggingFaceEmbeddings()
|
56 |
+
vectordb = Chroma.from_documents(documents=splits, embedding=embedding, collection_name=collection_name, persist_directory="./chroma_db")
|
57 |
+
return vectordb
|
58 |
+
```
|
59 |
+
|
60 |
+
**Initializing the Language Model Chain**
|
61 |
+
|
62 |
+
We integrate a language model from Hugging Face using LangChain's conversational retrieval chain.
|
63 |
+
```python
|
64 |
+
def initialize_llmchain(llm_model, temperature, max_tokens, top_k, vector_db):
|
65 |
+
llm = HuggingFaceEndpoint(repo_id=llm_model, temperature=temperature, max_new_tokens=max_tokens, top_k=top_k, load_in_8bit=True)
|
66 |
+
memory = ConversationBufferMemory(memory_key="chat_history", output_key='answer', return_messages=True)
|
67 |
+
retriever = vector_db.as_retriever()
|
68 |
+
qa_chain = ConversationalRetrievalChain.from_llm(llm, retriever=retriever, chain_type="stuff", memory=memory, return_source_documents=True)
|
69 |
+
return qa_chain
|
70 |
+
```
|
71 |
+
|
72 |
+
**Handling the Conversation**
|
73 |
+
|
74 |
+
This function generates the response using the conversational retrieval chain and updates the chat history.
|
75 |
+
```python
|
76 |
+
def conversation(qa_chain, message, history):
|
77 |
+
response = qa_chain({"question": message, "chat_history": history})
|
78 |
+
response_answer = response["answer"]
|
79 |
+
new_history = history + [(message, response_answer)]
|
80 |
+
return new_history, response_answer
|
81 |
+
```
|
82 |
+
|
83 |
+
### π Step 4: Putting It All Together in Streamlit
|
84 |
+
We now integrate all the functions into a single chatbot app.
|
85 |
+
|
86 |
+
```python
|
87 |
+
import tempfile
|
88 |
+
|
89 |
+
st.title("π¬ CV Q&A Chatbot")
|
90 |
+
st.write("Upload your CV and ask any questions about it!")
|
91 |
+
|
92 |
+
# File uploader for PDF document
|
93 |
+
file = st.file_uploader("Upload your CV", type=["pdf"])
|
94 |
+
|
95 |
+
# Initialize session state variables
|
96 |
+
if 'llm_chain' not in st.session_state:
|
97 |
+
st.session_state['llm_chain'] = None
|
98 |
+
if 'vector_db' not in st.session_state:
|
99 |
+
st.session_state['vector_db'] = None
|
100 |
+
if 'chat_history' not in st.session_state:
|
101 |
+
st.session_state['chat_history'] = []
|
102 |
+
|
103 |
+
if file is not None and st.session_state['llm_chain'] is None:
|
104 |
+
with st.spinner("Processing document..."):
|
105 |
+
with tempfile.NamedTemporaryFile(delete=False, suffix=".pdf") as tmp_file:
|
106 |
+
tmp_file.write(file.read())
|
107 |
+
tmp_file_path = tmp_file.name
|
108 |
+
doc_splits = load_doc([tmp_file_path], chunk_size=600, chunk_overlap=50)
|
109 |
+
vector_db = create_db(doc_splits, collection_name="my_collection")
|
110 |
+
llm_chain = initialize_llmchain(
|
111 |
+
llm_model="mistralai/Mixtral-8x7B-Instruct-v0.1",
|
112 |
+
temperature=0.7, max_tokens=1024, top_k=3, vector_db=vector_db)
|
113 |
+
st.session_state['llm_chain'] = llm_chain
|
114 |
+
st.session_state['vector_db'] = vector_db
|
115 |
+
st.session_state['chat_history'] = []
|
116 |
+
st.success("Document processed successfully!")
|
117 |
+
|
118 |
+
if "messages" not in st.session_state.keys(): # Initialize the chat message history
|
119 |
+
st.session_state.messages = [{"role": "assistant", "content": "Ask me a question about the Resume you uploaded!"}]
|
120 |
+
|
121 |
+
# Display the chat history
|
122 |
+
for message in st.session_state.messages:
|
123 |
+
st.chat_message(message["role"]).write(message["content"])
|
124 |
+
|
125 |
+
# Input for user questions
|
126 |
+
if prompt := st.chat_input(placeholder="Your question"):
|
127 |
+
if not st.session_state.get('llm_chain'):
|
128 |
+
st.info("Please upload your CV to continue.")
|
129 |
+
st.stop()
|
130 |
+
|
131 |
+
st.session_state.messages.append({"role": "user", "content": prompt})
|
132 |
+
st.chat_message("user").write(prompt)
|
133 |
+
|
134 |
+
with st.chat_message("assistant"):
|
135 |
+
st.session_state['chat_history'], response_answer = conversation(
|
136 |
+
st.session_state['llm_chain'], prompt, st.session_state['chat_history']
|
137 |
+
)
|
138 |
+
st.session_state.messages.append({"role": "assistant", "content": response_answer})
|
139 |
+
st.write(response_answer)
|
140 |
+
```
|
141 |
+
|
142 |
+
### π Step 5: Running the Chatbot
|
143 |
+
To run this app locally, simply use:
|
144 |
+
```bash
|
145 |
+
streamlit run your_script.py
|
146 |
+
```
|
147 |
+
|
148 |
+
## π Conclusion
|
149 |
+
With this code, you can deploy your CV chatbot on platforms like Streamlit or Hugging Face Spaces. You can embed it on your website or modify it further to suit your needs!
|
150 |
+
""")
|