mav23 commited on
Commit
6d870c3
1 Parent(s): 1b68a16

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. README.md +138 -0
  3. gemma-2-baku-2b.Q4_0.gguf +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ gemma-2-baku-2b.Q4_0.gguf filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,138 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ thumbnail: https://github.com/rinnakk/japanese-pretrained-models/blob/master/rinna.png
3
+ license: gemma
4
+ datasets:
5
+ - mc4
6
+ - wikipedia
7
+ - EleutherAI/pile
8
+ - oscar-corpus/colossal-oscar-1.0
9
+ - cc100
10
+ language:
11
+ - ja
12
+ - en
13
+ tags:
14
+ - gemma2
15
+ inference: false
16
+ base_model: google/gemma-2-2b
17
+ pipeline_tag: text-generation
18
+ library_name: transformers
19
+ ---
20
+
21
+ # `Gemma 2 Baku 2B (rinna/gemma-2-baku-2b)`
22
+
23
+ ![rinna-icon](./rinna.png)
24
+
25
+ # Overview
26
+
27
+ We conduct continual pre-training of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on **80B** tokens from a mixture of Japanese and English datasets. The continual pre-training improves the model's performance on Japanese tasks.
28
+
29
+ The name `baku` comes from the Japanese word [`獏/ばく/Baku`](https://ja.wikipedia.org/wiki/獏), which is a kind of Japanese mythical creature ([`妖怪/ようかい/Youkai`](https://ja.wikipedia.org/wiki/%E5%A6%96%E6%80%AA)).
30
+
31
+ | Size | Continual Pre-Training | Instruction-Tuning |
32
+ | :- | :- | :- |
33
+ | 2B | Gemma 2 Baku 2B [[HF]](https://huggingface.co/rinna/gemma-2-baku-2b) | Gemma 2 Baku 2B Instruct [[HF]](https://huggingface.co/rinna/gemma-2-baku-2b-it) |
34
+
35
+ * **Library**
36
+
37
+ The model was trained using code based on [Lightning-AI/litgpt](https://github.com/Lightning-AI/litgpt).
38
+
39
+ * **Model architecture**
40
+
41
+ A 26-layer, 2304-hidden-size transformer-based language model. Please refer to the [Gemma 2 Model Card](https://www.kaggle.com/models/google/gemma-2/) for detailed information on the model's architecture.
42
+
43
+ * **Training**
44
+
45
+ The model was initialized with the [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) model and continually trained on around **80B** tokens from a mixture of the following corpora
46
+ - [Japanese CC-100](https://huggingface.co/datasets/cc100)
47
+ - [Japanese C4](https://huggingface.co/datasets/mc4)
48
+ - [Japanese OSCAR](https://huggingface.co/datasets/oscar-corpus/colossal-oscar-1.0)
49
+ - [The Pile](https://huggingface.co/datasets/EleutherAI/pile)
50
+ - [Wikipedia](https://dumps.wikimedia.org/other/cirrussearch)
51
+ - rinna curated Japanese dataset
52
+
53
+ * **Contributors**
54
+ - [Toshiaki Wakatsuki](https://huggingface.co/t-w)
55
+ - [Xinqi Chen](https://huggingface.co/Keely0419)
56
+ - [Kei Sawada](https://huggingface.co/keisawada)
57
+
58
+ ---
59
+
60
+ # Benchmarking
61
+
62
+ Please refer to [rinna's LM benchmark page](https://rinnakk.github.io/research/benchmarks/lm/index.html).
63
+
64
+ ---
65
+
66
+ # How to use the model
67
+
68
+ ~~~python
69
+ import transformers
70
+ import torch
71
+
72
+ model_id = "rinna/gemma-2-baku-2b"
73
+ pipeline = transformers.pipeline(
74
+ "text-generation",
75
+ model=model_id,
76
+ model_kwargs={"torch_dtype": torch.bfloat16, "attn_implementation": "eager"},
77
+ device_map="auto"
78
+ )
79
+ output = pipeline(
80
+ "西田幾多郎は、",
81
+ max_new_tokens=256,
82
+ do_sample=True
83
+ )
84
+ print(output[0]["generated_text"])
85
+ ~~~
86
+
87
+ It is recommended to use eager attention when conducting batch inference under bfloat16 precision.
88
+ Currently, Gemma 2 yields NaN values for input sequences with padding when the default attention mechanism (torch.scaled_dot_product_attention) is employed in conjunction with bfloat16.
89
+
90
+ ---
91
+
92
+ # Tokenization
93
+ The model uses the original [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) tokenizer.
94
+
95
+ ---
96
+
97
+ # How to cite
98
+ ```bibtex
99
+ @misc{rinna-gemma-2-baku-2b,
100
+ title = {rinna/gemma-2-baku-2b},
101
+ author = {Wakatsuki, Toshiaki and Chen, Xinqi and Sawada, Kei},
102
+ url = {https://huggingface.co/rinna/gemma-2-baku-2b}
103
+ }
104
+
105
+ @inproceedings{sawada2024release,
106
+ title = {Release of Pre-Trained Models for the {J}apanese Language},
107
+ author = {Sawada, Kei and Zhao, Tianyu and Shing, Makoto and Mitsui, Kentaro and Kaga, Akio and Hono, Yukiya and Wakatsuki, Toshiaki and Mitsuda, Koh},
108
+ booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
109
+ month = {5},
110
+ year = {2024},
111
+ pages = {13898--13905},
112
+ url = {https://aclanthology.org/2024.lrec-main.1213},
113
+ note = {\url{https://arxiv.org/abs/2404.01657}}
114
+ }
115
+ ```
116
+ ---
117
+
118
+ # References
119
+ ```bibtex
120
+ @article{gemma-2-2024,
121
+ title = {Gemma 2},
122
+ url = {https://www.kaggle.com/models/google/gemma-2},
123
+ publisher = {Kaggle},
124
+ author = {Gemma Team},
125
+ year = {2024}
126
+ }
127
+
128
+ @misc{litgpt-2023,
129
+ author = {Lightning AI},
130
+ title = {LitGPT},
131
+ howpublished = {\url{https://github.com/Lightning-AI/litgpt}},
132
+ year = {2023}
133
+ }
134
+ ```
135
+ ---
136
+
137
+ # License
138
+ [Gemma Terms of Use](https://ai.google.dev/gemma/terms)
gemma-2-baku-2b.Q4_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:beee0845dc085c4497b7af7062da14538e1b37e7414554e7566cdc18375fd62d
3
+ size 1629508992