Commit
•
f403786
1
Parent(s):
f12f979
Add language information to model metadata
Browse filesThanks for sharing this incredible model! I've suggested language tags for the metadata section of the model based on the languages outlined in https://blog.salesforceairesearch.com/xgen/:
> For Wikipedia, we cover 22 languages: bg, ca, cs, da, de, en, es, fr, hr, hu, it, nl, pl, pt, ro, ru, sl, sr, sv, uk, ja, zh, more than LLaMA (20 languages) and MPT (English only).
Since most tokens in the training data are English, you might prefer only to choose English. In your blog post, I also didn't see if you did any additional evaluation of downstream performance for non-English languages, so you may prefer to choose a different subset of languages to the one I have selected.
README.md
CHANGED
@@ -1,5 +1,28 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
4 |
|
5 |
# XGen-7B-8K-Base
|
@@ -60,4 +83,4 @@ print(tokenizer.decode(sample[0]))
|
|
60 |
year={2023},
|
61 |
url={https://blog.salesforceairesearch.com/xgen}
|
62 |
}
|
63 |
-
```
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
- bg
|
6 |
+
- ca
|
7 |
+
- cs
|
8 |
+
- da
|
9 |
+
- de
|
10 |
+
- es
|
11 |
+
- fr
|
12 |
+
- hr
|
13 |
+
- hu
|
14 |
+
- it
|
15 |
+
- nl
|
16 |
+
- pl
|
17 |
+
- pt
|
18 |
+
- ro
|
19 |
+
- ru
|
20 |
+
- sl
|
21 |
+
- sr
|
22 |
+
- sv
|
23 |
+
- uk
|
24 |
+
- ja
|
25 |
+
- zh
|
26 |
---
|
27 |
|
28 |
# XGen-7B-8K-Base
|
|
|
83 |
year={2023},
|
84 |
url={https://blog.salesforceairesearch.com/xgen}
|
85 |
}
|
86 |
+
```
|