shibing624 commited on
Commit
4a0986f
1 Parent(s): ad126a7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +136 -1
README.md CHANGED
@@ -1,3 +1,138 @@
1
  ---
2
- license: apache-2.0
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - code
6
+ - autocomplete
7
+ - pytorch
8
+ - en
9
+ license: "apache-2.0"
10
  ---
11
+
12
+ # GPT2 for Code AutoComplete Model
13
+ code-autocomplete, a code completion plugin for Python.
14
+
15
+ **code-autocomplete** can automatically complete the code of lines and blocks with GPT2.
16
+
17
+ ## Usage
18
+
19
+ Open source repo:[code-autocomplete](https://github.com/shibing624/code-autocomplete),support GPT2 model, usage:
20
+
21
+ ```python
22
+ from autocomplete.gpt2_coder import GPT2Coder
23
+
24
+ m = GPT2Coder("shibing624/code-autocomplete-distilgpt2-python")
25
+ print(m.generate('import torch.nn as')[0])
26
+ ```
27
+
28
+ Also, use huggingface/transformers:
29
+
30
+ *Please use 'GPT2' related functions to load this model!*
31
+
32
+ ```python
33
+ import os
34
+ from transformers import GPT2Tokenizer, GPT2LMHeadModel
35
+
36
+ os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"
37
+
38
+ tokenizer = GPT2Tokenizer.from_pretrained("shibing624/code-autocomplete-distilgpt2-python")
39
+ model = GPT2LMHeadModel.from_pretrained("shibing624/code-autocomplete-distilgpt2-python")
40
+
41
+ prompts = [
42
+ """from torch import nn
43
+ class LSTM(Module):
44
+ def __init__(self, *,
45
+ n_tokens: int,
46
+ embedding_size: int,
47
+ hidden_size: int,
48
+ n_layers: int):""",
49
+ """import numpy as np
50
+ import torch
51
+ import torch.nn as""",
52
+ "import java.util.ArrayList",
53
+ "def factorial(n):",
54
+ ]
55
+ for prompt in prompts:
56
+ input_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors='pt')
57
+ outputs = model.generate(input_ids=input_ids,
58
+ max_length=64 + len(prompt),
59
+ temperature=1.0,
60
+ top_k=50,
61
+ top_p=0.95,
62
+ repetition_penalty=1.0,
63
+ do_sample=True,
64
+ num_return_sequences=1,
65
+ length_penalty=2.0,
66
+ early_stopping=True)
67
+ decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
68
+ print(decoded)
69
+ print("=" * 20)
70
+ ```
71
+
72
+ output:
73
+ ```shell
74
+ from torch import nn
75
+ class LSTM(Module):
76
+ def __init__(self, *,
77
+ n_tokens: int,
78
+ embedding_size: int,
79
+ hidden_size: int,
80
+ n_layers: int):
81
+ self.embedding_size = embedding_size
82
+ ====================
83
+ import numpy as np
84
+ import torch
85
+ import torch.nn as nn
86
+ import torch.nn.functional as F
87
+ ```
88
+
89
+ Model files:
90
+ ```
91
+ code-autocomplete-distilgpt2-python
92
+ ├── config.json
93
+ ├── merges.txt
94
+ ├── pytorch_model.bin
95
+ ├── special_tokens_map.json
96
+ ├── tokenizer_config.json
97
+ └── vocab.json
98
+ ```
99
+
100
+ ### Train data
101
+ #### pytorch_awesome projects source code
102
+
103
+ download [code-autocomplete](https://github.com/shibing624/code-autocomplete),
104
+ ```shell
105
+ cd autocomplete
106
+ python create_dataset.py
107
+ ```
108
+
109
+ If you want train code-autocomplete GPT2 model,refer [https://github.com/shibing624/code-autocomplete/blob/main/autocomplete/gpt2_coder.py](https://github.com/shibing624/code-autocomplete/blob/main/autocomplete/gpt2_coder.py)
110
+
111
+
112
+ ### About GPT2
113
+
114
+ Test the whole generation capabilities here: https://transformer.huggingface.co/doc/gpt2-large
115
+
116
+ Pretrained model on English language using a causal language modeling (CLM) objective. It was introduced in
117
+ [this paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
118
+ and first released at [this page](https://openai.com/blog/better-language-models/).
119
+
120
+ Disclaimer: The team releasing GPT-2 also wrote a
121
+ [model card](https://github.com/openai/gpt-2/blob/master/model_card.md) for their model. Content from this model card
122
+ has been written by the Hugging Face team to complete the information they provided and give specific examples of bias.
123
+
124
+
125
+ ## Citation
126
+
127
+ ```latex
128
+ @misc{code-autocomplete,
129
+ author = {Xu Ming},
130
+ title = {code-autocomplete: Code AutoComplete with GPT model},
131
+ year = {2022},
132
+ publisher = {GitHub},
133
+ journal = {GitHub repository},
134
+ url = {https://github.com/shibing624/code-autocomplete},
135
+ }
136
+ ```
137
+
138
+