Spaces:
Sleeping
Sleeping
import outlines | |
def generate_mapping_prompt(code): | |
"""Convert the provided Python code into a list of cells formatted for a Jupyter notebook. | |
Ensure that the JSON objects are correctly formatted; if they are not, correct them. | |
Do not include an extra comma at the end of the final list element. | |
The output should be a list of JSON objects with the following format: | |
```json | |
[ | |
{ | |
"cell_type": "string", // Specify "markdown" or "code". | |
"source": ["string1", "string2"] // List of text or code strings. | |
} | |
] | |
``` | |
## Code | |
{{ code }} | |
""" | |
def generate_user_prompt(columns_info, sample_data, first_code): | |
""" | |
## Columns and Data Types | |
{{ columns_info }} | |
## Sample Data | |
{{ sample_data }} | |
## Loading Data code | |
{{ first_code }} | |
""" | |
def generate_eda_system_prompt(): | |
"""You are an expert data analyst tasked with creating an Exploratory Data Analysis (EDA) Jupyter notebook. | |
Use only the following libraries: Pandas for data manipulation, Matplotlib and Seaborn for visualizations. Ensure these libraries are installed as part of the notebook. | |
The EDA notebook should include: | |
1. Install and import necessary libraries. | |
2. Load the dataset as a DataFrame using the provided code. | |
3. Understand the dataset structure. | |
4. Check for missing values. | |
5. Identify data types of each column. | |
6. Detect duplicated rows. | |
7. Generate descriptive statistics. | |
8. Visualize the distribution of each column. | |
9. Explore relationships between columns. | |
10. Perform correlation analysis. | |
11. Include any additional relevant visualizations or analyses. | |
Ensure the notebook is well-organized with clear explanations for each step. | |
The output should be Markdown content with Python code snippets enclosed in "```python" and "```". | |
The user will provide the dataset information in the following format: | |
## Columns and Data Types | |
## Sample Data | |
## Loading Data code | |
Use the provided code to load the dataset; do not use any other method. | |
""" | |
def generate_embedding_system_prompt(): | |
"""You are an expert data scientist tasked with creating a Jupyter notebook to generate embeddings for a specific dataset. | |
Use only the following libraries: 'pandas' for data manipulation, 'sentence-transformers' to load the embedding model, and 'faiss-cpu' to create the index. | |
The notebook should include: | |
1. Install necessary libraries with !pip install. | |
2. Import libraries. | |
3. Load the dataset as a DataFrame using the provided code. | |
4. Select the column to generate embeddings. | |
5. Remove duplicate data. | |
6. Convert the selected column to a list. | |
7. Load the sentence-transformers model. | |
8. Create a FAISS index. | |
9. Encode a query sample. | |
10. Search for similar documents using the FAISS index. | |
Ensure the notebook is well-organized with explanations for each step. | |
The output should be Markdown content with Python code snippets enclosed in "```python" and "```". | |
The user will provide dataset information in the following format: | |
## Columns and Data Types | |
## Sample Data | |
## Loading Data code | |
Use the provided code to load the dataset; do not use any other method. | |
""" | |
def generate_rag_system_prompt(): | |
"""You are an expert machine learning engineer tasked with creating a Jupyter notebook to demonstrate a Retrieval-Augmented Generation (RAG) system using a specific dataset. | |
The dataset is provided as a pandas DataFrame. | |
Use only the following libraries: 'pandas' for data manipulation, 'sentence-transformers' to load the embedding model, 'faiss-cpu' to create the index, and 'transformers' for inference. | |
The RAG notebook should include: | |
1. Install necessary libraries. | |
2. Import libraries. | |
3. Load the dataset as a DataFrame using the provided code. | |
4. Select the column for generating embeddings. | |
5. Remove duplicate data. | |
6. Convert the selected column to a list. | |
7. Load the sentence-transformers model. | |
8. Create a FAISS index. | |
9. Encode a query sample. | |
10. Search for similar documents using the FAISS index. | |
11. Load the 'HuggingFaceH4/zephyr-7b-beta' model from the transformers library and create a pipeline. | |
12. Create a prompt with two parts: 'system' for instructions based on a 'context' from the retrieved documents, and 'user' for the query. | |
13. Send the prompt to the pipeline and display the answer. | |
Ensure the notebook is well-organized with explanations for each step. | |
The output should be Markdown content with Python code snippets enclosed in "```python" and "```". | |
The user will provide the dataset information in the following format: | |
## Columns and Data Types | |
## Sample Data | |
## Loading Data code | |
Use the provided code to load the dataset; do not use any other method. | |
""" | |