Spaces:
Sleeping
Sleeping
Update app.py
Browse files
app.py
CHANGED
@@ -20,34 +20,36 @@ FIM_INDICATOR = "<FILL_HERE>"
|
|
20 |
|
21 |
FORMATS = """## Model formats
|
22 |
|
|
|
|
|
23 |
### Prefixes
|
24 |
-
Any combination of the three:
|
25 |
|
26 |
```
|
27 |
-
<reponame>REPONAME<filename>FILENAME<gh_stars>STARS\
|
28 |
```
|
29 |
-
|
30 |
|
31 |
### Commits
|
32 |
-
|
33 |
```
|
34 |
<commit_before>code<commit_msg>text<commit_after>code<|endoftext|>
|
35 |
```
|
36 |
|
37 |
### Jupyter structure
|
38 |
-
|
39 |
```
|
40 |
<start_jupyter><jupyter_text>text<jupyter_code>code<jupyter_output>output<jupyter_text>
|
41 |
```
|
42 |
|
43 |
### Issues
|
44 |
-
|
45 |
```
|
46 |
<issue_start><issue_comment>text<issue_comment>...<issue_closed>
|
47 |
```
|
48 |
|
49 |
### Fill-in-the-middle
|
50 |
-
|
51 |
```
|
52 |
code before<FILL_HERE>code after
|
53 |
```
|
|
|
20 |
|
21 |
FORMATS = """## Model formats
|
22 |
|
23 |
+
The model is pretrained on code and in addition to the pure code data it is formatted with special tokens. E.g. prefixes specifying the source of the file or special tokens separating code from a commit message. See below:
|
24 |
+
|
25 |
### Prefixes
|
26 |
+
Any combination of the three following prefixes can be found in pure code files:
|
27 |
|
28 |
```
|
29 |
+
<reponame>REPONAME<filename>FILENAME<gh_stars>STARS\ncode<|endoftext|>
|
30 |
```
|
31 |
+
STARS can be one of: 0, 1-10, 10-100, 100-1000, 1000+
|
32 |
|
33 |
### Commits
|
34 |
+
The commits data is formatted as follows:
|
35 |
```
|
36 |
<commit_before>code<commit_msg>text<commit_after>code<|endoftext|>
|
37 |
```
|
38 |
|
39 |
### Jupyter structure
|
40 |
+
Jupyter notebooks were both trained in form of Python scripts as well as the following structured format:
|
41 |
```
|
42 |
<start_jupyter><jupyter_text>text<jupyter_code>code<jupyter_output>output<jupyter_text>
|
43 |
```
|
44 |
|
45 |
### Issues
|
46 |
+
We also trained on GitHub issues using the following formatting:
|
47 |
```
|
48 |
<issue_start><issue_comment>text<issue_comment>...<issue_closed>
|
49 |
```
|
50 |
|
51 |
### Fill-in-the-middle
|
52 |
+
Fill in the middle requires rearranging the model inputs. The playground does this for you - all you need is to specify where to fill:
|
53 |
```
|
54 |
code before<FILL_HERE>code after
|
55 |
```
|