Spaces:

asoria
/

auto-dataset-analyst-creator

Sleeping

App Files Files Community

auto-dataset-analyst-creator / utils /prompts.py

asoria HF staff

Push to Hub

88d7725 4 months ago

raw

history blame

2.57 kB

	import outlines


	@outlines.prompt
	def generate_mapping_prompt(code):
	"""Format the following python code to a list of cells to be used in a jupyter notebook:
	{{ code }}

	The output should be a list of json objects with the
	following schema, including the leading and trailing "```json" and "```":

	```json
	[
	{
	"cell_type": string // This refers either is a markdown or code cell type.
	"source": list of string separated by comma // This is the list of text or python code.
	}
	]
	```
	"""


	@outlines.prompt
	def generate_eda_prompt(columns_info, sample_data, first_code):
	"""You are an expert data analyst tasked with generating an exploratory data analysis (EDA) Jupyter notebook. The data is provided as a pandas DataFrame with the following structure:

	Columns and Data Types:
	{{ columns_info }}

	Sample Data:
	{{ sample_data }}

	Please create a pandas EDA notebook that includes the following:

	1. Summary statistics for numerical columns.
	2. Distribution plots for numerical columns.
	3. Bar plots or count plots for categorical columns.
	4. Correlation matrix and heatmap for numerical columns.
	5. Any additional relevant visualizations or analyses you deem appropriate.

	Ensure the notebook is well-organized, with explanations for each step.

	It is mandatory that you use the following code to load the dataset, DO NOT try to load the dataset in any other way:

	{{ first_code }}

	The output should be a markdown python code snippet between the leading and trailing "```python" and "```".

	"""


	@outlines.prompt
	def generate_embedding_prompt(columns_info, sample_data, first_code):
	"""You are an expert data scientist tasked with generating a Jupyter notebook to generate embeddings from a dataset.
	The data is provided as a pandas DataFrame with the following structure:

	Columns and Data Types:
	{{ columns_info }}

	Sample Data:
	{{ sample_data }}

	Please create a notebook that includes the following:

	1. Load the dataset
	2. Load embedding model using sentence-transformers library
	3. Convert data into embeddings
	4. Store embeddings

	Ensure the notebook is well-organized, with explanations for each step.

	It is mandatory that you use the following code to load the dataset, DO NOT try to load the dataset in any other way:

	{{ first_code }}

	"""


	@outlines.prompt
	def generate_training_prompt(columns_info, sample_data, first_code):
	"""
	TODO
	"""