Add language information to model metadata

Thanks for sharing this incredible model! I've suggested language tags for the metadata section of the model based on the languages outlined in https://blog.salesforceairesearch.com/xgen/:

> For Wikipedia, we cover 22 languages: bg, ca, cs, da, de, en, es, fr, hr, hu, it, nl, pl, pt, ro, ru, sl, sr, sv, uk, ja, zh, more than LLaMA (20 languages) and MPT (English only).

Since most tokens in the training data are English, you might prefer only to choose English. In your blog post, I also didn't see if you did any additional evaluation of downstream performance for non-English languages, so you may prefer to choose a different subset of languages to the one I have selected.

Files changed (1) hide show

README.md +24 -1

README.md CHANGED Viewed

@@ -1,5 +1,28 @@
 ---
 license: apache-2.0
 ---
 # XGen-7B-8K-Base
@@ -60,4 +83,4 @@ print(tokenizer.decode(sample[0]))
   year={2023},
   url={https://blog.salesforceairesearch.com/xgen}
 }
-```

 ---
 license: apache-2.0
+language:
+- en
+- bg
+- ca
+- cs
+- da
+- de
+- es
+- fr
+- hr
+- hu
+- it
+- nl
+- pl
+- pt
+- ro
+- ru
+- sl
+- sr
+- sv
+- uk
+- ja
+- zh
 ---
 # XGen-7B-8K-Base
   year={2023},
   url={https://blog.salesforceairesearch.com/xgen}
 }
+```