Lauredecaudin commited on
Commit
8a81693
β€’
1 Parent(s): 602bf86

Create 6- 🎁 BONUS : Build your own chatbot.py

Browse files
pages/6- 🎁 BONUS : Build your own chatbot.py ADDED
@@ -0,0 +1,150 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ import tempfile
3
+
4
+ # Title and introduction for the tutorial
5
+ st.title("🎁 Chatbot Implementation Tutorial")
6
+ st.write("""
7
+ ### πŸ“š How to Build Your Own CV Q&A Chatbot
8
+
9
+ In this tutorial, we'll guide you through building a chatbot that can answer questions about your CV using **LangChain** and **Hugging Face** models.
10
+ This bot will help you interact with your resume in a conversational manner.
11
+
12
+ You can:
13
+ 1. πŸ€– Upload your CV, and the bot will answer any questions about it.
14
+ 2. πŸ›  Use this code to deploy your own chatbot and embed it on your website or deploy it on [Streamlit](https://streamlit.io) or [Hugging Face Spaces](https://huggingface.co/spaces).
15
+
16
+ ---
17
+
18
+ ### πŸ“ Step 1: Install the Required Libraries
19
+
20
+ Make sure you have the following libraries installed:
21
+
22
+ ```bash
23
+ pip install streamlit langchain chromadb transformers huggingface_hub
24
+ ```
25
+
26
+
27
+ ### πŸ“ Step 2: Structure of the Chatbot
28
+ The chatbot performs the following actions:
29
+
30
+ - **Loads the PDF**: Upload your CV, which is processed and split into manageable chunks.
31
+ - **Creates a Vector Database**: The document chunks are embedded into vectors and stored in a vector database using Chroma.
32
+ - **Uses an LLM for Responses**: We use a language model from Hugging Face to provide conversational answers based on your CV content.
33
+ - **Manages Conversation**: The bot maintains conversation history and uses it to generate more accurate responses.
34
+
35
+ ### πŸ“ Step 3: Code Breakdown
36
+ **Loading and Splitting the PDF Document**
37
+
38
+ We first define a function that loads your PDF CV and splits it into smaller chunks.
39
+ ```python
40
+ def load_doc(list_file_path, chunk_size=600, chunk_overlap=50):
41
+ loaders = [PyPDFLoader(x) for x in list_file_path]
42
+ pages = []
43
+ for loader in loaders:
44
+ pages.extend(loader.load())
45
+ text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
46
+ doc_splits = text_splitter.split_documents(pages)
47
+ return doc_splits
48
+ ```
49
+
50
+ **Creating the Vector Database**
51
+
52
+ Next, we create a vector store using the Hugging Face embeddings and Chroma.
53
+ ```python
54
+ def create_db(splits, collection_name):
55
+ embedding = HuggingFaceEmbeddings()
56
+ vectordb = Chroma.from_documents(documents=splits, embedding=embedding, collection_name=collection_name, persist_directory="./chroma_db")
57
+ return vectordb
58
+ ```
59
+
60
+ **Initializing the Language Model Chain**
61
+
62
+ We integrate a language model from Hugging Face using LangChain's conversational retrieval chain.
63
+ ```python
64
+ def initialize_llmchain(llm_model, temperature, max_tokens, top_k, vector_db):
65
+ llm = HuggingFaceEndpoint(repo_id=llm_model, temperature=temperature, max_new_tokens=max_tokens, top_k=top_k, load_in_8bit=True)
66
+ memory = ConversationBufferMemory(memory_key="chat_history", output_key='answer', return_messages=True)
67
+ retriever = vector_db.as_retriever()
68
+ qa_chain = ConversationalRetrievalChain.from_llm(llm, retriever=retriever, chain_type="stuff", memory=memory, return_source_documents=True)
69
+ return qa_chain
70
+ ```
71
+
72
+ **Handling the Conversation**
73
+
74
+ This function generates the response using the conversational retrieval chain and updates the chat history.
75
+ ```python
76
+ def conversation(qa_chain, message, history):
77
+ response = qa_chain({"question": message, "chat_history": history})
78
+ response_answer = response["answer"]
79
+ new_history = history + [(message, response_answer)]
80
+ return new_history, response_answer
81
+ ```
82
+
83
+ ### πŸ“ Step 4: Putting It All Together in Streamlit
84
+ We now integrate all the functions into a single chatbot app.
85
+
86
+ ```python
87
+ import tempfile
88
+
89
+ st.title("πŸ’¬ CV Q&A Chatbot")
90
+ st.write("Upload your CV and ask any questions about it!")
91
+
92
+ # File uploader for PDF document
93
+ file = st.file_uploader("Upload your CV", type=["pdf"])
94
+
95
+ # Initialize session state variables
96
+ if 'llm_chain' not in st.session_state:
97
+ st.session_state['llm_chain'] = None
98
+ if 'vector_db' not in st.session_state:
99
+ st.session_state['vector_db'] = None
100
+ if 'chat_history' not in st.session_state:
101
+ st.session_state['chat_history'] = []
102
+
103
+ if file is not None and st.session_state['llm_chain'] is None:
104
+ with st.spinner("Processing document..."):
105
+ with tempfile.NamedTemporaryFile(delete=False, suffix=".pdf") as tmp_file:
106
+ tmp_file.write(file.read())
107
+ tmp_file_path = tmp_file.name
108
+ doc_splits = load_doc([tmp_file_path], chunk_size=600, chunk_overlap=50)
109
+ vector_db = create_db(doc_splits, collection_name="my_collection")
110
+ llm_chain = initialize_llmchain(
111
+ llm_model="mistralai/Mixtral-8x7B-Instruct-v0.1",
112
+ temperature=0.7, max_tokens=1024, top_k=3, vector_db=vector_db)
113
+ st.session_state['llm_chain'] = llm_chain
114
+ st.session_state['vector_db'] = vector_db
115
+ st.session_state['chat_history'] = []
116
+ st.success("Document processed successfully!")
117
+
118
+ if "messages" not in st.session_state.keys(): # Initialize the chat message history
119
+ st.session_state.messages = [{"role": "assistant", "content": "Ask me a question about the Resume you uploaded!"}]
120
+
121
+ # Display the chat history
122
+ for message in st.session_state.messages:
123
+ st.chat_message(message["role"]).write(message["content"])
124
+
125
+ # Input for user questions
126
+ if prompt := st.chat_input(placeholder="Your question"):
127
+ if not st.session_state.get('llm_chain'):
128
+ st.info("Please upload your CV to continue.")
129
+ st.stop()
130
+
131
+ st.session_state.messages.append({"role": "user", "content": prompt})
132
+ st.chat_message("user").write(prompt)
133
+
134
+ with st.chat_message("assistant"):
135
+ st.session_state['chat_history'], response_answer = conversation(
136
+ st.session_state['llm_chain'], prompt, st.session_state['chat_history']
137
+ )
138
+ st.session_state.messages.append({"role": "assistant", "content": response_answer})
139
+ st.write(response_answer)
140
+ ```
141
+
142
+ ### πŸ“ Step 5: Running the Chatbot
143
+ To run this app locally, simply use:
144
+ ```bash
145
+ streamlit run your_script.py
146
+ ```
147
+
148
+ ## πŸŽ‰ Conclusion
149
+ With this code, you can deploy your CV chatbot on platforms like Streamlit or Hugging Face Spaces. You can embed it on your website or modify it further to suit your needs!
150
+ """)