metadata
language:
- en
pipeline_tag: text2text-generation
tags:
- career roadmap generator
- NLP
- GenAI
- Chatbot
Streamlit Webapp
AI Powered Career Roadmap Generator: Your AI-powered Insight Generator from JDs
Our project aims to develop an AI-powered platform that provides personalized career roadmaps tailored to specific job descriptions. Leveraging the capabilities of OpenAI's API, we will utilize generative AI models to analyze job descriptions and generate comprehensive roadmaps outlining the necessary skills, knowledge, and steps required to excel in those roles. This innovative solution will cater to individuals seeking guidance in their career paths by offering curated insights and actionable recommendations
Features
- Instant Insights: Extracts and analyses text from uploaded job description dataset, skills dataset etc to provide instant insights.
- Retrieval-Augmented Generation: Utilizes OpenAI API for high-quality, contextually relevant answers.
- Secure API Key Input: Ensures secure entry of OpenAI API keys for accessing generative AI models.
Getting Started
Prerequisites
- OpenAI API Key - used GPT-4o for roadmap generation and GPT-4-turbo-preview for embeddings
- Streamlit: This application is built with Streamlit. Ensure you have Streamlit installed in your environment
Installation
Clone this repository or download the source code to your local machine. Navigate to the application directory and install the required Python packages:
Note: You need Python 3.10 or above.
1. Create a virtual environment (venv):
```sh
python -m venv path-to-venv
```
`venv` stands for Virtual Environment.
2. Activate the virtual environment:
```sh
source path-to-venv/bin/activate
```
On Windows:
```sh
.\venv\Scripts\activate
```
3. Install requirements:
```sh
pip install -r requirements.txt
```
## Running
## Data
1. Kaggle : https://www.kaggle.com/datasets/dilshaansandhu/international-jobs-dataset/data
2. O-net : https://www.onetcenter.org/database.html#individual-files
### Technical Overview
Excel Processing: Utilizes UnstructuredExcelLoader for extracting text from Excel documents, CSVLoader for extracting text from csv documents, PyPDFLoader for extracting text from pdf documents.
Text Chunking: Employs the RecursiveCharacterTextSplitter from LangChain for dividing the extracted text into manageable chunks.
Vector Store Creation: Uses chromadb for creating a searchable vector store from text chunks.
Answer Generation: Leverages gpt-4-turbo-preview model from OpenAI for generating answers to user queries using the context provided by the uploaded documents.
### Indexing
Helping Functions:
`from_web(url)`
Description: This function retrieves documents from a web page specified by the given URL.
Purpose: It's used to scrape text content from web pages for further processing and analysis.
Implementation: It utilizes BeautifulSoup (bs4) for web scraping and the langchain library for document loading.
`from_excel(file_address)`
Description: This function loads documents from Excel files located either at the specified file address or within a directory specified by the file address.
Purpose: It's designed to extract text data from Excel files, which may contain structured or tabular data.
Implementation: It utilizes the langchain_community library for loading Excel files.
`from_csv(file_address):`
Description: This function loads documents from a CSV file located at the specified file address.
Purpose: It's used for extracting text data from CSV files, which are commonly used for storing tabular data.
Implementation: It utilizes the langchain_community library for loading CSV files.
`from_pdf(file_address):`
Description: This function loads documents from a PDF file located at the specified file address.
Purpose: It's intended for extracting text data from PDF documents.
Implementation: It utilizes the langchain_community library for loading PDF files.
`from_text_files(file_address):`
Description: This function loads documents from text files (.txt) located within a directory specified by the file address.
Purpose: It's used to extract text data from multiple text files within a directory.
Implementation: It utilizes the langchain_community library for loading text files.
`retriever_from_docs(docs):`
Description: This function processes the retrieved documents, splitting them into smaller chunks, generating embeddings for each chunk, and storing them in a Chroma vector store.
Purpose: It's responsible for preprocessing and embedding the text data for further analysis or retrieval.
Implementation: It utilizes various components such as text splitters, embeddings, and Chroma vector stores provided by the langchain library.
1. Go to `bot/rag_indexing`
2. Run:
```sh
python indexing.py <data-sources>
```
For example:
```sh
python indexing.py <path/to/data/files>
Run the app
- Go to
/bot
- Run:
streamlit run bot-langchain-chat.py
- Give the job description to the prompt. It will generate the relevant roadmap.
# AI-Powered-Career-Roadmap