aashish1904 commited on
Commit
c7a6365
1 Parent(s): fe32723

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +150 -0
README.md ADDED
@@ -0,0 +1,150 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+
4
+ thumbnail: https://github.com/rinnakk/japanese-pretrained-models/blob/master/rinna.png
5
+ license: gemma
6
+ datasets:
7
+ - mc4
8
+ - wikipedia
9
+ - EleutherAI/pile
10
+ - oscar-corpus/colossal-oscar-1.0
11
+ - cc100
12
+ language:
13
+ - ja
14
+ - en
15
+ tags:
16
+ - gemma2
17
+ inference: false
18
+ base_model: google/gemma-2-2b
19
+ pipeline_tag: text-generation
20
+ library_name: transformers
21
+
22
+ ---
23
+
24
+ [![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
25
+
26
+
27
+ # QuantFactory/gemma-2-baku-2b-GGUF
28
+ This is quantized version of [rinna/gemma-2-baku-2b](https://huggingface.co/rinna/gemma-2-baku-2b) created using llama.cpp
29
+
30
+ # Original Model Card
31
+
32
+
33
+ # `Gemma 2 Baku 2B (rinna/gemma-2-baku-2b)`
34
+
35
+ ![rinna-icon](./rinna.png)
36
+
37
+ # Overview
38
+
39
+ We conduct continual pre-training of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on **80B** tokens from a mixture of Japanese and English datasets. The continual pre-training improves the model's performance on Japanese tasks.
40
+
41
+ The name `baku` comes from the Japanese word [`獏/ばく/Baku`](https://ja.wikipedia.org/wiki/獏), which is a kind of Japanese mythical creature ([`妖怪/ようかい/Youkai`](https://ja.wikipedia.org/wiki/%E5%A6%96%E6%80%AA)).
42
+
43
+ | Size | Continual Pre-Training | Instruction-Tuning |
44
+ | :- | :- | :- |
45
+ | 2B | Gemma 2 Baku 2B [[HF]](https://huggingface.co/rinna/gemma-2-baku-2b) | Gemma 2 Baku 2B Instruct [[HF]](https://huggingface.co/rinna/gemma-2-baku-2b-it) |
46
+
47
+ * **Library**
48
+
49
+ The model was trained using code based on [Lightning-AI/litgpt](https://github.com/Lightning-AI/litgpt).
50
+
51
+ * **Model architecture**
52
+
53
+ A 26-layer, 2304-hidden-size transformer-based language model. Please refer to the [Gemma 2 Model Card](https://www.kaggle.com/models/google/gemma-2/) for detailed information on the model's architecture.
54
+
55
+ * **Training**
56
+
57
+ The model was initialized with the [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) model and continually trained on around **80B** tokens from a mixture of the following corpora
58
+ - [Japanese CC-100](https://huggingface.co/datasets/cc100)
59
+ - [Japanese C4](https://huggingface.co/datasets/mc4)
60
+ - [Japanese OSCAR](https://huggingface.co/datasets/oscar-corpus/colossal-oscar-1.0)
61
+ - [The Pile](https://huggingface.co/datasets/EleutherAI/pile)
62
+ - [Wikipedia](https://dumps.wikimedia.org/other/cirrussearch)
63
+ - rinna curated Japanese dataset
64
+
65
+ * **Contributors**
66
+ - [Toshiaki Wakatsuki](https://huggingface.co/t-w)
67
+ - [Xinqi Chen](https://huggingface.co/Keely0419)
68
+ - [Kei Sawada](https://huggingface.co/keisawada)
69
+
70
+ ---
71
+
72
+ # Benchmarking
73
+
74
+ Please refer to [rinna's LM benchmark page](https://rinnakk.github.io/research/benchmarks/lm/index.html).
75
+
76
+ ---
77
+
78
+ # How to use the model
79
+
80
+ ~~~python
81
+ import transformers
82
+ import torch
83
+
84
+ model_id = "rinna/gemma-2-baku-2b"
85
+ pipeline = transformers.pipeline(
86
+ "text-generation",
87
+ model=model_id,
88
+ model_kwargs={"torch_dtype": torch.bfloat16, "attn_implementation": "eager"},
89
+ device_map="auto"
90
+ )
91
+ output = pipeline(
92
+ "西田幾多郎は、",
93
+ max_new_tokens=256,
94
+ do_sample=True
95
+ )
96
+ print(output[0]["generated_text"])
97
+ ~~~
98
+
99
+ It is recommended to use eager attention when conducting batch inference under bfloat16 precision.
100
+ Currently, Gemma 2 yields NaN values for input sequences with padding when the default attention mechanism (torch.scaled_dot_product_attention) is employed in conjunction with bfloat16.
101
+
102
+ ---
103
+
104
+ # Tokenization
105
+ The model uses the original [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) tokenizer.
106
+
107
+ ---
108
+
109
+ # How to cite
110
+ ```bibtex
111
+ @misc{rinna-gemma-2-baku-2b,
112
+ title = {rinna/gemma-2-baku-2b},
113
+ author = {Wakatsuki, Toshiaki and Chen, Xinqi and Sawada, Kei},
114
+ url = {https://huggingface.co/rinna/gemma-2-baku-2b}
115
+ }
116
+
117
+ @inproceedings{sawada2024release,
118
+ title = {Release of Pre-Trained Models for the {J}apanese Language},
119
+ author = {Sawada, Kei and Zhao, Tianyu and Shing, Makoto and Mitsui, Kentaro and Kaga, Akio and Hono, Yukiya and Wakatsuki, Toshiaki and Mitsuda, Koh},
120
+ booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
121
+ month = {5},
122
+ year = {2024},
123
+ pages = {13898--13905},
124
+ url = {https://aclanthology.org/2024.lrec-main.1213},
125
+ note = {\url{https://arxiv.org/abs/2404.01657}}
126
+ }
127
+ ```
128
+ ---
129
+
130
+ # References
131
+ ```bibtex
132
+ @article{gemma-2-2024,
133
+ title = {Gemma 2},
134
+ url = {https://www.kaggle.com/models/google/gemma-2},
135
+ publisher = {Kaggle},
136
+ author = {Gemma Team},
137
+ year = {2024}
138
+ }
139
+
140
+ @misc{litgpt-2023,
141
+ author = {Lightning AI},
142
+ title = {LitGPT},
143
+ howpublished = {\url{https://github.com/Lightning-AI/litgpt}},
144
+ year = {2023}
145
+ }
146
+ ```
147
+ ---
148
+
149
+ # License
150
+ [Gemma Terms of Use](https://ai.google.dev/gemma/terms)