XTransformer commited on
Commit
b97699e
·
1 Parent(s): 9d206aa

update model card

Browse files
Files changed (1) hide show
  1. README.md +109 -0
README.md CHANGED
@@ -1,3 +1,112 @@
1
  ---
 
 
 
 
 
2
  license: llama2
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - code
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - llama-2
7
  license: llama2
8
  ---
9
+ # **PandasSolver**
10
+ PandasSolver is a fine-tuned generative text models with 7 billion parameters. It achieves 54.98% on [DS-1000](https://ds1000-code-gen.github.io/) Pandas Completion tasks,
11
+ while the accuracy of GPT-4 (August 2023 version) is 43.99%.
12
+
13
+ ## Model Use
14
+
15
+ To use this model, please make sure to install transformers from `main` until the next version is released:
16
+
17
+ ```bash
18
+ pip install git+https://github.com/huggingface/transformers.git@main accelerate
19
+ ```
20
+
21
+
22
+ ```python
23
+ from transformers import AutoTokenizer
24
+ import transformers
25
+ import torch
26
+
27
+ model = "Transform72/PandasSolver"
28
+
29
+ tokenizer = AutoTokenizer.from_pretrained(model)
30
+ pipeline = transformers.pipeline(
31
+ "text-generation",
32
+ model=model,
33
+ torch_dtype=torch.bfloat16,
34
+ device_map="auto",
35
+ )
36
+
37
+ sample_prompt = """
38
+ PROBLEM:
39
+ You have been given a dataset that contains information about students, including their names, ages, grades, and favorite subjects. You need to perform the following tasks using Pandas:
40
+
41
+ 1. Load the dataset into a Pandas DataFrame named "students_df". The dataset is provided as a CSV file named "students.csv".
42
+
43
+ 2. Find the maximum and minimum ages of the students.
44
+
45
+ 3. Create a pivot table that shows the average grades of students for each favorite subject. The pivot table should have the subjects as columns and the average grades as values.
46
+
47
+ 4. Calculate the sum of ages for students who have the same favorite subject.
48
+ """
49
+
50
+ sequences = pipeline(
51
+ sample_prompt,
52
+ do_sample=True,
53
+ temperature=0.2,
54
+ top_p=0.95,
55
+ num_return_sequences=1,
56
+ eos_token_id=tokenizer.eos_token_id,
57
+ max_length=512,
58
+ )
59
+ for seq in sequences:
60
+ print(f"Result: {seq['generated_text']}")
61
+ ```
62
+
63
+
64
+ ## Model Details
65
+
66
+ **Model Developers** Transform72
67
+
68
+ **Model Architecture** PandasSolver is an auto-regressive language model that uses [codellama](https://huggingface.co/codellama/CodeLlama-7b-Python-hf) transformer architecture, and fine tuned on [WizardCoder](https://github.com/nlpxucan/WizardLM/tree/main/WizardCoder).
69
+
70
+
71
+ ## Intended Use
72
+ **Intended Use Cases** Given the relative small number of parameters, this model may need the prompt to be as detailed as the sample example above to perform well.
73
+
74
+ **Out-of-Scope Uses** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English. Use in any other way that is prohibited by the Acceptable Use Policy and Licensing Agreement for [Code Llama] (https://huggingface.co/codellama/CodeLlama-7b-Python-hf).
75
+
76
+
77
+ ## Training Data
78
+
79
+ ~24 million tokens were used to fine tune the model. They are all high quality Pandas question & answer pairs.
80
+
81
+ ## Evaluation Results
82
+
83
+ Performance on DS-1000:
84
+
85
+ ```
86
+ Pandas Avg. Acc: 54.98%
87
+ Numpy Avg. Acc: 36.36%
88
+ Matplotlib Avg. Acc: 52.90%
89
+ Tensorflow Avg. Acc: 28.89%
90
+ Scipy Avg. Acc: 29.25%
91
+ Sklearn Avg. Acc: 25.22%
92
+ Pytorch Avg. Acc: 27.94%
93
+ DS-1000 Avg. Acc: 41.40%
94
+ ```
95
+
96
+ Although it is fine-tuned on Pandas Q&A pairs, it has also achieved good improvements on other libraries (except for Tensorflow):
97
+
98
+ ```
99
+ Pandas Avg. Acc: +38.83%
100
+ Numpy Avg. Acc: +10.91%
101
+ Matplotlib Avg. Acc: +1.29%
102
+ Tensorflow Avg. Acc: -2.22%
103
+ Scipy Avg. Acc: +11.33%
104
+ Sklearn Avg. Acc: +6.09%
105
+ Pytorch Avg. Acc: +7.35%
106
+ DS-1000 Avg. Acc: +16.2%
107
+ ```
108
+
109
+
110
+ ## Ethical Considerations and Limitations
111
+
112
+ PandasSolver is a new technology that carries risks with use. Testing conducted to date has been in English, and has not covered, nor could it cover all scenarios. For these reasons, as with all LLMs, PandasSolver’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate or objectionable responses to user prompts. Therefore, before deploying any applications of PandasSolver, developers should perform safety testing and tuning tailored to their specific applications of the model.