shibing624 commited on
Commit
6a8c5e3
1 Parent(s): 6186722

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +140 -138
README.md CHANGED
@@ -1,138 +1,140 @@
1
- ---
2
- language:
3
- - en
4
- tags:
5
- - code
6
- - autocomplete
7
- - pytorch
8
- - en
9
- license: "apache-2.0"
10
- ---
11
-
12
- # GPT2 for Code AutoComplete Model
13
- code-autocomplete, a code completion plugin for Python.
14
-
15
- **code-autocomplete** can automatically complete the code of lines and blocks with GPT2.
16
-
17
- ## Usage
18
-
19
- Open source repo:[code-autocomplete](https://github.com/shibing624/code-autocomplete),support GPT2 model, usage:
20
-
21
- ```python
22
- from autocomplete.gpt2_coder import GPT2Coder
23
-
24
- m = GPT2Coder("shibing624/code-autocomplete-distilgpt2-python")
25
- print(m.generate('import torch.nn as')[0])
26
- ```
27
-
28
- Also, use huggingface/transformers:
29
-
30
- *Please use 'GPT2' related functions to load this model!*
31
-
32
- ```python
33
- import os
34
- from transformers import GPT2Tokenizer, GPT2LMHeadModel
35
-
36
- os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"
37
-
38
- tokenizer = GPT2Tokenizer.from_pretrained("shibing624/code-autocomplete-distilgpt2-python")
39
- model = GPT2LMHeadModel.from_pretrained("shibing624/code-autocomplete-distilgpt2-python")
40
-
41
- prompts = [
42
- """from torch import nn
43
- class LSTM(Module):
44
- def __init__(self, *,
45
- n_tokens: int,
46
- embedding_size: int,
47
- hidden_size: int,
48
- n_layers: int):""",
49
- """import numpy as np
50
- import torch
51
- import torch.nn as""",
52
- "import java.util.ArrayList",
53
- "def factorial(n):",
54
- ]
55
- for prompt in prompts:
56
- input_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors='pt')
57
- outputs = model.generate(input_ids=input_ids,
58
- max_length=64 + len(prompt),
59
- temperature=1.0,
60
- top_k=50,
61
- top_p=0.95,
62
- repetition_penalty=1.0,
63
- do_sample=True,
64
- num_return_sequences=1,
65
- length_penalty=2.0,
66
- early_stopping=True)
67
- decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
68
- print(decoded)
69
- print("=" * 20)
70
- ```
71
-
72
- output:
73
- ```shell
74
- from torch import nn
75
- class LSTM(Module):
76
- def __init__(self, *,
77
- n_tokens: int,
78
- embedding_size: int,
79
- hidden_size: int,
80
- n_layers: int):
81
- self.embedding_size = embedding_size
82
- ====================
83
- import numpy as np
84
- import torch
85
- import torch.nn as nn
86
- import torch.nn.functional as F
87
- ```
88
-
89
- Model files:
90
- ```
91
- code-autocomplete-distilgpt2-python
92
- ├── config.json
93
- ├── merges.txt
94
- ├── pytorch_model.bin
95
- ├── special_tokens_map.json
96
- ├── tokenizer_config.json
97
- └── vocab.json
98
- ```
99
-
100
- ### Train data
101
- #### pytorch_awesome projects source code
102
-
103
- download [code-autocomplete](https://github.com/shibing624/code-autocomplete),
104
- ```shell
105
- cd autocomplete
106
- python create_dataset.py
107
- ```
108
-
109
- If you want train code-autocomplete GPT2 model,refer [https://github.com/shibing624/code-autocomplete/blob/main/autocomplete/gpt2_coder.py](https://github.com/shibing624/code-autocomplete/blob/main/autocomplete/gpt2_coder.py)
110
-
111
-
112
- ### About GPT2
113
-
114
- Test the whole generation capabilities here: https://transformer.huggingface.co/doc/gpt2-large
115
-
116
- Pretrained model on English language using a causal language modeling (CLM) objective. It was introduced in
117
- [this paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
118
- and first released at [this page](https://openai.com/blog/better-language-models/).
119
-
120
- Disclaimer: The team releasing GPT-2 also wrote a
121
- [model card](https://github.com/openai/gpt-2/blob/master/model_card.md) for their model. Content from this model card
122
- has been written by the Hugging Face team to complete the information they provided and give specific examples of bias.
123
-
124
-
125
- ## Citation
126
-
127
- ```latex
128
- @misc{code-autocomplete,
129
- author = {Xu Ming},
130
- title = {code-autocomplete: Code AutoComplete with GPT model},
131
- year = {2022},
132
- publisher = {GitHub},
133
- journal = {GitHub repository},
134
- url = {https://github.com/shibing624/code-autocomplete},
135
- }
136
- ```
137
-
138
-
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - code
6
+ - autocomplete
7
+ - pytorch
8
+ - en
9
+ license: apache-2.0
10
+ library_name: transformers
11
+ pipeline_tag: text-generation
12
+ widget:
13
+ - text: import torch.nn as
14
+ ---
15
+
16
+ # GPT2 for Code AutoComplete Model
17
+ code-autocomplete, a code completion plugin for Python.
18
+
19
+ **code-autocomplete** can automatically complete the code of lines and blocks with GPT2.
20
+
21
+ ## Usage
22
+
23
+ Open source repo:[code-autocomplete](https://github.com/shibing624/code-autocomplete),support GPT2 model, usage:
24
+
25
+ ```python
26
+ from autocomplete.gpt2_coder import GPT2Coder
27
+
28
+ m = GPT2Coder("shibing624/code-autocomplete-distilgpt2-python")
29
+ print(m.generate('import torch.nn as')[0])
30
+ ```
31
+
32
+ Also, use huggingface/transformers:
33
+
34
+ *Please use 'GPT2' related functions to load this model!*
35
+
36
+ ```python
37
+ import os
38
+ from transformers import GPT2Tokenizer, GPT2LMHeadModel
39
+
40
+ os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"
41
+
42
+ tokenizer = GPT2Tokenizer.from_pretrained("shibing624/code-autocomplete-distilgpt2-python")
43
+ model = GPT2LMHeadModel.from_pretrained("shibing624/code-autocomplete-distilgpt2-python")
44
+
45
+ prompts = [
46
+ """from torch import nn
47
+ class LSTM(Module):
48
+ def __init__(self, *,
49
+ n_tokens: int,
50
+ embedding_size: int,
51
+ hidden_size: int,
52
+ n_layers: int):""",
53
+ """import numpy as np
54
+ import torch
55
+ import torch.nn as""",
56
+ "import java.util.ArrayList",
57
+ "def factorial(n):",
58
+ ]
59
+ for prompt in prompts:
60
+ input_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors='pt')
61
+ outputs = model.generate(input_ids=input_ids,
62
+ max_length=64 + len(prompt),
63
+ temperature=1.0,
64
+ top_k=50,
65
+ top_p=0.95,
66
+ repetition_penalty=1.0,
67
+ do_sample=True,
68
+ num_return_sequences=1,
69
+ length_penalty=2.0,
70
+ early_stopping=True)
71
+ decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
72
+ print(decoded)
73
+ print("=" * 20)
74
+ ```
75
+
76
+ output:
77
+ ```shell
78
+ from torch import nn
79
+ class LSTM(Module):
80
+ def __init__(self, *,
81
+ n_tokens: int,
82
+ embedding_size: int,
83
+ hidden_size: int,
84
+ n_layers: int):
85
+ self.embedding_size = embedding_size
86
+ ====================
87
+ import numpy as np
88
+ import torch
89
+ import torch.nn as nn
90
+ import torch.nn.functional as F
91
+ ```
92
+
93
+ Model files:
94
+ ```
95
+ code-autocomplete-distilgpt2-python
96
+ ├── config.json
97
+ ├── merges.txt
98
+ ├── pytorch_model.bin
99
+ ├── special_tokens_map.json
100
+ ├── tokenizer_config.json
101
+ └── vocab.json
102
+ ```
103
+
104
+ ### Train data
105
+ #### pytorch_awesome projects source code
106
+
107
+ download [code-autocomplete](https://github.com/shibing624/code-autocomplete),
108
+ ```shell
109
+ cd autocomplete
110
+ python create_dataset.py
111
+ ```
112
+
113
+ If you want train code-autocomplete GPT2 model,refer [https://github.com/shibing624/code-autocomplete/blob/main/autocomplete/gpt2_coder.py](https://github.com/shibing624/code-autocomplete/blob/main/autocomplete/gpt2_coder.py)
114
+
115
+
116
+ ### About GPT2
117
+
118
+ Test the whole generation capabilities here: https://transformer.huggingface.co/doc/gpt2-large
119
+
120
+ Pretrained model on English language using a causal language modeling (CLM) objective. It was introduced in
121
+ [this paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
122
+ and first released at [this page](https://openai.com/blog/better-language-models/).
123
+
124
+ Disclaimer: The team releasing GPT-2 also wrote a
125
+ [model card](https://github.com/openai/gpt-2/blob/master/model_card.md) for their model. Content from this model card
126
+ has been written by the Hugging Face team to complete the information they provided and give specific examples of bias.
127
+
128
+
129
+ ## Citation
130
+
131
+ ```latex
132
+ @misc{code-autocomplete,
133
+ author = {Xu Ming},
134
+ title = {code-autocomplete: Code AutoComplete with GPT model},
135
+ year = {2022},
136
+ publisher = {GitHub},
137
+ journal = {GitHub repository},
138
+ url = {https://github.com/shibing624/code-autocomplete},
139
+ }
140
+ ```