LoupGarou commited on
Commit
897bae9
1 Parent(s): dbb656a

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +142 -0
README.md ADDED
@@ -0,0 +1,142 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ # For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
3
+ # Doc / guide: https://huggingface.co/docs/hub/model-cards
4
+ {}
5
+ ---
6
+
7
+ # Model Card for deepseek-coder-33b-instruct-pythagora-v3
8
+
9
+ This model card describes the deepseek-coder-33b-instruct-pythagora version 3 model, which is a fine-tuned version of the DeepSeek Coder 33B Instruct model, specifically optimized for use with the Pythagora GPT Pilot application.
10
+
11
+ ## Model Details
12
+
13
+ ### Model Description
14
+
15
+ - **Developed by:** LoupGarou (GitHub: [MoonlightByte](https://github.com/MoonlightByte))
16
+ - **Model type:** Causal language model
17
+ - **Language(s) (NLP):** English
18
+ - **License:** DeepSeek Coder Model License
19
+ - **Finetuned from model:** [DeepSeek Coder 33B Instruct](https://huggingface.co/deepseek-ai/deepseek-coder-33b-instruct)
20
+
21
+ ### Model Sources
22
+
23
+ - **Repository:** [LoupGarou/deepseek-coder-33b-instruct-pythagora-gguf](https://huggingface.co/LoupGarou/deepseek-coder-33b-instruct-pythagora-v3-gguf)
24
+ - **GitHub Repository (Proxy Application):** [MoonlightByte/Pythagora-LLM-Proxy](https://github.com/MoonlightByte/Pythagora-LLM-Proxy)
25
+ - **Original Model Repository:** [DeepSeek Coder](https://github.com/deepseek-ai/deepseek-coder)
26
+
27
+ ## Uses
28
+
29
+ ### Direct Use
30
+
31
+ This model is intended for use with the [Pythagora GPT Pilot](https://github.com/Pythagora-io/gpt-pilot) application, which enables the creation of fully working, production-ready apps with the assistance of a developer. The model has been fine-tuned to work seamlessly with the GPT Pilot prompt structures and can be utilized through the [Pythagora LLM Proxy](https://github.com/MoonlightByte/Pythagora-LLM-Proxy).
32
+
33
+ The model is designed to generate code and assist with various programming tasks, such as writing features, debugging, and providing code reviews, all within the context of the Pythagora GPT Pilot application.
34
+
35
+ ### Out-of-Scope Use
36
+
37
+ This model should not be used for tasks outside of the intended use case with the Pythagora GPT Pilot application. It is not designed for standalone use or integration with other applications without proper testing and adaptation. Additionally, the model should not be used for generating content related to sensitive topics, such as politics, security, or privacy issues, as it is specifically trained to focus on computer science and programming-related tasks.
38
+
39
+ ## Bias, Risks, and Limitations
40
+
41
+ As with any language model, there may be biases present in the training data that could be reflected in the model's outputs. Users should be aware of potential limitations and biases when using this model. The model's performance may be impacted by the quality and relevance of the input prompts, as well as the specific programming languages and frameworks used in the context of the Pythagora GPT Pilot application.
42
+
43
+ ### Recommendations
44
+
45
+ Users should familiarize themselves with the [Pythagora GPT Pilot](https://github.com/Pythagora-io/gpt-pilot) application and its intended use cases before utilizing this model. It is recommended to use the model in conjunction with the [Pythagora LLM Proxy](https://github.com/MoonlightByte/Pythagora-LLM-Proxy) for optimal performance and compatibility. When using the model, users should carefully review and test the generated code to ensure its correctness, efficiency, and adherence to best practices and project requirements.
46
+
47
+ ## How to Get Started with the Model
48
+
49
+ To use this model with the Pythagora GPT Pilot application:
50
+
51
+ 1. Set up the Pythagora LLM Proxy by following the instructions in the [GitHub repository](https://github.com/MoonlightByte/Pythagora-LLM-Proxy).
52
+ 2. Configure GPT Pilot to use the proxy by setting the OpenAI API endpoint to `http://localhost:8080/v1/chat/completions`.
53
+ 3. Run GPT Pilot as usual, and the proxy will handle the communication between GPT Pilot and the deepseek-coder-6.7b-instruct-pythagora model.
54
+ 4. It is possible to run Pythagora directly to LM Studio or any other service with mixed results since these models were not finetuned using a chat format.
55
+
56
+ For more detailed instructions and examples, please refer to the [Pythagora LLM Proxy README](https://github.com/MoonlightByte/Pythagora-LLM-Proxy/blob/main/README.md).
57
+
58
+ ## Training Details
59
+
60
+ ### Training Data
61
+
62
+ The model was fine-tuned using a custom dataset created from sample prompts generated by the Pythagora prompt structures. The prompts are compatible with the version described in the [Pythagora README](https://github.com/Pythagora-io/gpt-pilot/blob/main/README.md). The dataset was carefully curated to ensure high-quality examples and a diverse range of programming tasks relevant to the Pythagora GPT Pilot application.
63
+
64
+ ### Training Procedure
65
+
66
+ The model was fine-tuned using the training scripts and resources provided in the [DeepSeek Coder GitHub repository](https://github.com/deepseek-ai/DeepSeek-Coder.git). Specifically, the [finetune/finetune_deepseekcoder.py](https://github.com/deepseek-ai/DeepSeek-Coder/blob/main/finetune/finetune_deepseekcoder.py) script was used to perform the fine-tuning process. The model was trained using PEFT with a maximum sequence length of 9,000 tokens, utilizing the custom dataset to adapt the base DeepSeek Coder 33B Instruct model to the specific requirements and prompt structures of the Pythagora GPT Pilot application.
67
+
68
+ The training process leveraged state-of-the-art techniques and hardware, including DeepSpeed integration for efficient distributed training, to ensure optimal performance and compatibility with the target application. For detailed information on the training procedure, including the specific hyperparameters and configurations used, please refer to the [DeepSeek Coder Fine-tuning Documentation](https://github.com/deepseek-ai/DeepSeek-Coder#how-to-fine-tune-deepseek-coder).
69
+
70
+ ## Model Examination
71
+
72
+ No additional interpretability work has been performed on this model. However, the model's performance has been thoroughly tested and validated within the context of the Pythagora GPT Pilot application to ensure its effectiveness in generating high-quality code and assisting with programming tasks.
73
+
74
+ ## Environmental Impact
75
+
76
+ The environmental impact of this model has not been assessed. More information is needed to estimate the carbon emissions and electricity usage associated with the model's training and deployment. As a general recommendation, users should strive to utilize the model efficiently and responsibly to minimize any potential environmental impact.
77
+
78
+ ## Technical Specifications
79
+
80
+ - **Model Architecture:** The model architecture is based on the DeepSeek Coder 33B Instruct model, which is a transformer-based causal language model optimized for code generation and understanding.
81
+ - **Compute Infrastructure:** The model was fine-tuned using high-performance computing resources, including GPUs, to ensure efficient and timely training. The exact specifications of the compute infrastructure used for training are not publicly disclosed.
82
+
83
+ ## Citation
84
+
85
+ **APA:**
86
+ LoupGarou. (2024). deepseek-coder-33b-instruct-pythagora-v3 (Model). https://huggingface.co/LoupGarou/deepseek-coder-33b-instruct-pythagora-v3
87
+
88
+ ## Model Card Contact
89
+
90
+ For questions, feedback, or concerns regarding this model, please contact LoupGarou through the GitHub repository: [MoonlightByte/Pythagora-LLM-Proxy](https://github.com/MoonlightByte/Pythagora-LLM-Proxy). You can open an issue or submit a pull request to discuss any aspects of the model or its usage within the Pythagora GPT Pilot application.
91
+
92
+ **Original model card: DeepSeek's Deepseek Coder 33B Instruct**
93
+
94
+ **[🏠Homepage](https://www.deepseek.com/)** | **[🤖 Chat with DeepSeek Coder](https://coder.deepseek.com/)** | **[Discord](https://discord.gg/Tc7c45Zzu5)** | **[Wechat(微信)](https://github.com/guoday/assert/blob/main/QR.png?raw=true)**
95
+
96
+ ---
97
+
98
+ ### 1. Introduction of Deepseek Coder
99
+
100
+ Deepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese. We provide various sizes of the code model, ranging from 1B to 33B versions. Each model is pre-trained on project-level code corpus by employing a window size of 16K and a extra fill-in-the-blank task, to support project-level code completion and infilling. For coding capabilities, Deepseek Coder achieves state-of-the-art performance among open-source code models on multiple programming languages and various benchmarks.
101
+
102
+ - **Massive Training Data**: Trained from scratch fon 2T tokens, including 87% code and 13% linguistic data in both English and Chinese languages.
103
+ - **Highly Flexible & Scalable**: Offered in model sizes of 1.3B, 5.7B, 6.7B, and 33B, enabling users to choose the setup most suitable for their requirements.
104
+ - **Superior Model Performance**: State-of-the-art performance among publicly available code models on HumanEval, MultiPL-E, MBPP, DS-1000, and APPS benchmarks.
105
+ - **Advanced Code Completion Capabilities**: A window size of 16K and a fill-in-the-blank task, supporting project-level code completion and infilling tasks.
106
+
107
+ ### 2. Model Summary
108
+
109
+ deepseek-coder-33b-instruct is a 33B parameter model initialized from deepseek-coder-33b-base and fine-tuned on 2B tokens of instruction data.
110
+
111
+ - **Home Page:** [DeepSeek](https://www.deepseek.com/)
112
+ - **Repository:** [deepseek-ai/deepseek-coder](https://github.com/deepseek-ai/deepseek-coder)
113
+ - **Chat With DeepSeek Coder:** [DeepSeek-Coder](https://coder.deepseek.com/)
114
+
115
+ ### 3. How to Use
116
+
117
+ Here give some examples of how to use our model.
118
+
119
+ #### Chat Model Inference
120
+
121
+ ```python
122
+ from transformers import AutoTokenizer, AutoModelForCausalLM
123
+ tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-6.7b-instruct", trust_remote_code=True)
124
+ model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-6.7b-instruct", trust_remote_code=True).cuda()
125
+ messages=[
126
+ { 'role': 'user', 'content': "write a quick sort algorithm in python."}
127
+ ]
128
+ inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
129
+ # 32021 is the id of <|EOT|> token
130
+ outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, top_k=50, top_p=0.95, num_return_sequences=1, eos_token_id=32021)
131
+ print(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))
132
+ ```
133
+
134
+ ### 4. License
135
+
136
+ This code repository is licensed under the MIT License. The use of DeepSeek Coder models is subject to the Model License. DeepSeek Coder supports commercial use.
137
+
138
+ See the [LICENSE-MODEL](https://github.com/deepseek-ai/deepseek-coder/blob/main/LICENSE-MODEL) for more details.
139
+
140
+ ### 5. Contact
141
+
142
+ If you have any questions, please raise an issue or contact us at [agi_code@deepseek.com](mailto:agi_code@deepseek.com).