qhduan commited on
Commit
47915d9
2 Parent(s): 5c2525a 7cb17ce

Merge branch 'main' of https://huggingface.co/kdf/javascript-docstring-generation into main

Browse files
Files changed (1) hide show
  1. README.md +45 -0
README.md CHANGED
@@ -1,3 +1,48 @@
1
  ---
2
  license: apache-2.0
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ widget:
4
+ - text: "<|endoftext|>\nfunction getDateAfterNDay(n){\n return moment().add(n, 'day')\n}\n// docstring\n/**"
5
  ---
6
+
7
+ ## Basic info
8
+
9
+ model based [Salesforce/codegen-350M-mono](https://huggingface.co/Salesforce/codegen-350M-mono)
10
+
11
+ fine-tuned with data [codeparrot/github-code-clean](https://huggingface.co/datasets/codeparrot/github-code-clean)
12
+
13
+ data filter by JavaScript and TypeScript
14
+
15
+ ## Usage
16
+
17
+ ```python
18
+ from transformers import AutoTokenizer, AutoModelForCausalLM
19
+
20
+ model_type = 'kdf/javascript-docstring-generation'
21
+ tokenizer = AutoTokenizer.from_pretrained(model_type)
22
+ model = AutoModelForCausalLM.from_pretrained(model_type)
23
+
24
+ inputs = tokenizer('''<|endoftext|>
25
+ function getDateAfterNDay(n){
26
+ return moment().add(n, 'day')
27
+ }
28
+
29
+ // docstring
30
+ /**''', return_tensors='pt')
31
+
32
+ doc_max_length = 128
33
+
34
+ generated_ids = model.generate(
35
+ **inputs,
36
+ max_length=inputs.input_ids.shape[1] + doc_max_length,
37
+ do_sample=False,
38
+ return_dict_in_generate=True,
39
+ num_return_sequences=1,
40
+ output_scores=True,
41
+ pad_token_id=50256,
42
+ eos_token_id=50256 # <|endoftext|>
43
+ )
44
+
45
+ ret = tokenizer.decode(generated_ids.sequences[0], skip_special_tokens=False)
46
+ print(ret)
47
+
48
+ ```