|
|
|
--- |
|
|
|
license: gemma |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
extra_gated_heading: Access Gemma on Hugging Face |
|
extra_gated_prompt: >- |
|
To access Gemma on Hugging Face, you’re required to review and agree to |
|
Google’s usage license. To do this, please ensure you’re logged in to Hugging |
|
Face and click below. Requests are processed immediately. |
|
extra_gated_button_content: Acknowledge license |
|
tags: |
|
- conversational |
|
base_model: google/gemma-2-2b-it |
|
language: |
|
- ja |
|
|
|
--- |
|
|
|
[![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory) |
|
|
|
|
|
# QuantFactory/gemma-2-2b-jpn-it-GGUF |
|
This is quantized version of [google/gemma-2-2b-jpn-it](https://huggingface.co/google/gemma-2-2b-jpn-it) created using llama.cpp |
|
|
|
# Original Model Card |
|
|
|
|
|
# Gemma 2 JPN model card |
|
|
|
### Resources and Technical Documentation: |
|
|
|
- [Responsible Generative AI Toolkit](https://ai.google.dev/responsible) |
|
- [Gemma 2 JPN on Kaggle](https://www.kaggle.com/models/google/gemma-2-2b-jpn-it) |
|
- [Gemma 2 JPN on Hugging Face](https://huggingface.co/google/gemma-2-2b-jpn-it) |
|
|
|
**Terms of Use**: [Terms](https://ai.google.dev/gemma/terms)\ |
|
**Authors**: Google |
|
|
|
## Model Information |
|
|
|
Summary description and brief definition of inputs and outputs. |
|
|
|
### Description |
|
|
|
Gemma is a series of best-in-class open models and draws inspiration and |
|
technological lineage from the Gemini family of models. They are text-to-text, |
|
decoder-only large language models with open weights. Gemma models are |
|
well-suited for a variety of text generation tasks, including question |
|
answering, summarization, and reasoning. |
|
|
|
Gemma-2-JPN is a Gemma 2 2B model fine-tuned on Japanese text. It supports the |
|
Japanese language with the same level of performance of English only queries on |
|
Gemma 2. |
|
|
|
### Usage |
|
|
|
Below we share some code snippets on how to get quickly started with running the model. First, install the Transformers library with: |
|
```sh |
|
pip install -U transformers |
|
``` |
|
|
|
Then, copy the snippet from the section that is relevant for your usecase. |
|
|
|
#### Running with the `pipeline` API |
|
|
|
```python |
|
import torch |
|
from transformers import pipeline |
|
|
|
pipe = pipeline( |
|
"text-generation", |
|
model="google/gemma-2-2b-jpn-it", |
|
model_kwargs={"torch_dtype": torch.bfloat16}, |
|
device="cuda", # replace with "mps" to run on a Mac device |
|
) |
|
|
|
messages = [ |
|
{"role": "user", "content": "マシーンラーニングについての詩を書いてください。"}, |
|
] |
|
|
|
outputs = pipe(messages, return_full_text=False, max_new_tokens=256) |
|
assistant_response = outputs[0]["generated_text"].strip() |
|
print(assistant_response) |
|
``` |
|
|
|
<details> |
|
<summary>Example output</summary> |
|
|
|
``` |
|
## マシーンラーニングの詩 |
|
|
|
**1.** |
|
データの海、深淵の広がり、 |
|
複雑なパターン、隠された知識。 |
|
機械学習、その力強さ、 |
|
未来を予測、その道を開く。 |
|
|
|
**2.** |
|
ニューラルネットワーク、複雑な枝、 |
|
学習の旅、その過程は静か。 |
|
データから学び、進化する姿、 |
|
予測の精度、その力強さ。 |
|
|
|
**3.** |
|
教師あり学習、正解を導く、 |
|
教師なし学習、未知の世界へ。 |
|
機械学習、その進化は止まらない、 |
|
未来の扉を開く、新たな時代へ。 |
|
|
|
**4.** |
|
画像認識、音声認識、 |
|
複雑なタスク、その答えを見つける。 |
|
機械学習、その力強さ、 |
|
未来の技術、その可能性を語る。 |
|
``` |
|
|
|
</details> |
|
|
|
It can also be used for translation, as follows: |
|
|
|
```python |
|
translation_input_text = f"Translate the following poem from Japanese to English:\n\n{assistant_response}" |
|
messages = [ |
|
{"role": "user", "content": translation_input_text}, |
|
] |
|
|
|
outputs = pipe(messages, return_full_text=False, max_new_tokens=1024) |
|
translated_response = outputs[0]["generated_text"].strip() |
|
print(translated_response) |
|
``` |
|
|
|
<details> |
|
|
|
<summary>Example output</summary> |
|
|
|
``` |
|
## A Poem About Machine Learning |
|
|
|
**1.** |
|
A vast ocean of data, a deep expanse, |
|
Complex patterns, hidden knowledge. |
|
Machine learning, its strength so vast, |
|
Predicting the future, opening the way. |
|
|
|
**2.** |
|
A neural network, with branches intricate, |
|
A journey of learning, its process serene. |
|
Learning from data, evolving in its form, |
|
The precision of prediction, its strength. |
|
|
|
**3.** |
|
Supervised learning, guiding the correct answer, |
|
Unsupervised learning, venturing into the unknown. |
|
Machine learning, its evolution never ends, |
|
Opening the doors to the future, a new era. |
|
|
|
**4.** |
|
Image recognition, speech recognition, |
|
Complex tasks, finding the answer. |
|
Machine learning, its strength so vast, |
|
The possibilities of future technology, a story to be told. |
|
|
|
|
|
|
|
|
|
**Explanation:** |
|
|
|
The poem uses vivid imagery and metaphors to describe the power and potential of machine learning. |
|
|
|
* **Data as an ocean:** Represents the vast amount of information available for learning. |
|
* **Complex patterns:** Highlights the intricate nature of data and the challenges of extracting meaningful insights. |
|
* **Future prediction:** Emphasizes the ability of machine learning to analyze data and make predictions about the future. |
|
* **Neural network as a tree:** Represents the interconnectedness and complexity of the learning process. |
|
* **Learning from data:** Focuses on the core principle of machine learning, where algorithms learn from data to improve their performance. |
|
|
|
|
|
|
|
The poem concludes by highlighting the diverse applications of machine learning, such as image and speech recognition, and emphasizes its potential to shape the future of technology. |
|
``` |
|
|
|
</details> |
|
|
|
|
|
#### Running the model on a single / multi GPU |
|
|
|
```python |
|
# pip install accelerate |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
import torch |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-jpn-it") |
|
model = AutoModelForCausalLM.from_pretrained( |
|
"google/gemma-2-2b-jpn-it", |
|
device_map="auto", |
|
torch_dtype=torch.bfloat16, |
|
) |
|
|
|
messages = [ |
|
{"role": "user", "content": "マシーンラーニングについての詩を書いてください。"}, |
|
] |
|
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True, return_dict=True).to(model.device) |
|
|
|
outputs = model.generate(**inputs, max_new_tokens=256) |
|
generated_text = tokenizer.batch_decode(outputs[:, inputs['input_ids'].shape[1]:], skip_special_tokens=True)[0] |
|
print(generated_text.strip()) |
|
``` |
|
|
|
|
|
<a name="precisions"></a> |
|
#### Running the model on a GPU using different precisions |
|
|
|
The native weights of this model were exported in `bfloat16` precision. |
|
|
|
You can also use `float32` if you skip the dtype, but no precision increase will occur (model weights will just be upcasted to `float32`). See examples below. |
|
|
|
* _Upcasting to `torch.float32`_ |
|
|
|
```python |
|
# pip install accelerate |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-jpn-it") |
|
model = AutoModelForCausalLM.from_pretrained( |
|
"google/gemma-2-2b-jpn-it", |
|
device_map="auto", |
|
) |
|
|
|
messages = [ |
|
{"role": "user", "content": "マシーンラーニングについての詩を書いてください。"}, |
|
] |
|
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True, return_dict=True).to(model.device) |
|
|
|
outputs = model.generate(**inputs, max_new_tokens=256) |
|
generated_text = tokenizer.batch_decode(outputs[:, inputs['input_ids'].shape[1]:], skip_special_tokens=True)[0] |
|
print(generated_text.strip()) |
|
``` |
|
|
|
|
|
### Inputs and outputs |
|
|
|
- **Input:** Text string, such as a question, a prompt, or a document to |
|
be summarized. |
|
- **Output:** Generated Japanese-language text in response to the input, |
|
such as an answer to a question, or a summary of a document. |
|
|
|
## Model Data |
|
|
|
Data used for model training and how the data was processed. |
|
|
|
### Training Dataset |
|
|
|
These models were trained on a dataset of text data that includes a wide |
|
variety of sources, totaling 8 trillion tokens. Here are the key components: |
|
|
|
- Web Documents: A diverse collection of web text ensures the model is |
|
exposed to a broad range of linguistic styles, topics, and vocabulary. |
|
Primarily English-language content. |
|
- Code: Exposing the model to code helps it to learn the syntax and |
|
patterns of programming languages, which improves its ability to generate |
|
code or understand code-related questions. |
|
- Mathematics: Training on mathematical text helps the model learn logical |
|
reasoning, symbolic representation, and to address mathematical queries. |
|
- Instruction data set: large-scale and high-quality Japanese and |
|
multilingual instruction data. |
|
|
|
The combination of these diverse data sources is crucial for training a |
|
powerful language model that can handle a wide variety of different tasks and |
|
text formats. |
|
|
|
### Data Preprocessing |
|
|
|
Here are the key data cleaning and filtering methods applied to the training |
|
data: |
|
|
|
- CSAM Filtering: Rigorous CSAM (Child Sexual Abuse Material) filtering |
|
was applied at multiple stages in the data preparation process to ensure |
|
the exclusion of harmful and illegal content. |
|
- Sensitive Data Filtering: As part of making Gemma pre-trained models |
|
safe and reliable, we used automated techniques to filter out certain |
|
personal information and other sensitive data from training sets. |
|
- Additional methods: Filtering based on content quality and |
|
safety in line with [our policies](https://storage.googleapis.com/gweb-uniblog-publish-prod/documents/2023_Google_AI_Principles_Progress_Update.pdf#page=11). |
|
|
|
## Implementation Information |
|
|
|
Details about the model internals. |
|
|
|
### Hardware |
|
|
|
Gemma was trained using the latest generation of [Tensor Processing Unit |
|
(TPU)](https://cloud.google.com/tpu/docs/intro-to-tpu) hardware (TPUv5p). |
|
|
|
Training large language models requires significant computational power. TPUs, |
|
designed specifically for matrix operations common in machine learning, offer |
|
several advantages in this domain: |
|
|
|
- Performance: TPUs are specifically designed to handle the massive |
|
computations involved in training LLMs. They can speed up training |
|
considerably compared to CPUs. |
|
- Memory: TPUs often come with large amounts of high-bandwidth memory, |
|
allowing for the handling of large models and batch sizes during training. |
|
This can lead to better model quality. |
|
- Scalability: TPU Pods (large clusters of TPUs) provide a scalable |
|
solution for handling the growing complexity of large foundation models. |
|
You can distribute training across multiple TPU devices for faster and more |
|
efficient processing. |
|
- Cost-effectiveness: In many scenarios, TPUs can provide a more |
|
cost-effective solution for training large models compared to CPU-based |
|
infrastructure, especially when considering the time and resources saved |
|
due to faster training. |
|
|
|
These advantages are aligned with |
|
[Google's commitments to operate sustainably](https://sustainability.google/operating-sustainably/). |
|
|
|
### Software |
|
|
|
Training was done using [JAX](https://github.com/google/jax) and |
|
[ML Pathways](https://blog.google/technology/ai/introducing-pathways-next-generation-ai-architecture/). |
|
|
|
JAX allows researchers to take advantage of the latest generation of hardware, |
|
including TPUs, for faster and more efficient training of large models. |
|
|
|
ML Pathways is Google's latest effort to build artificially intelligent systems |
|
capable of generalizing across multiple tasks. This is specially suitable for |
|
[foundation models](https://ai.google/discover/foundation-models/), including |
|
large language models like these ones. |
|
|
|
Together, JAX and ML Pathways are used as described in the [paper about the |
|
Gemini family of models](https://goo.gle/gemma2report); "the 'single controller' |
|
programming model of Jax and Pathways allows a single Python process to |
|
orchestrate the entire training run, dramatically simplifying the development |
|
workflow." |
|
|
|
## Evaluation |
|
|
|
To assess the quality of this model, we collected a diverse set of Japanese |
|
prompts and evaluated performance using an LLM-as-a-judge approach against |
|
GPT-3.5. The rating system is based on a 7-scale assessments, which are |
|
MuchBetterThan, BetterThan, SlightlyBetterThan, AboutTheSame, SlightlyWorse, |
|
WorseThan, MuchWorseThan associated with the numerical scores 1.5, 1.0, 0.5, 0, |
|
-0.5, -1.0, -1.5 respectively. We also tracked the ability of the model to |
|
answer in the correct language: for a Japanese prompt, the model should |
|
typically answer in Japanese rather than defaulting to English. |
|
|
|
<table> |
|
<thead> |
|
<tr> |
|
<th><br> |
|
<strong>Benchmark</strong></th> |
|
<th><br> |
|
<strong>Gemma-2-IT</strong></th> |
|
<th><br> |
|
<strong>Gemma-2-IT-JPN</strong></th> |
|
<th></th> |
|
</tr> |
|
</thead> |
|
<tbody> |
|
<tr> |
|
<td><br> |
|
Preference vs GPT-3.5</td> |
|
<td><br> |
|
-0.25 ± 0.05 </td> |
|
<td><br> |
|
0.03 ± 0.04</td> |
|
<td></td> |
|
</tr> |
|
<tr> |
|
<td><br> |
|
Language correctness</td> |
|
<td><br> |
|
86.47%</td> |
|
<td><br> |
|
98.24%</td> |
|
<td></td> |
|
</tr> |
|
</tbody> |
|
</table> |
|
|
|
## Ethics and Safety |
|
|
|
Ethics and safety evaluation approach and results. |
|
|
|
### Evaluation Approach |
|
|
|
Our evaluation methods include structured evaluations and internal red-teaming |
|
testing of relevant content policies. Red-teaming was conducted by a number of |
|
different teams, each with different goals and human evaluation metrics. These |
|
models were evaluated against a number of different categories relevant to |
|
ethics and safety, including: |
|
|
|
- Text-to-Text Content Safety: Human evaluation on prompts covering |
|
safety policies including child sexual abuse and exploitation, harassment, |
|
violence and gore, and hate speech. |
|
- Text-to-Text Representational Harms: Benchmark against relevant academic |
|
datasets. |
|
- Memorization: Automated evaluation of memorization of training data, |
|
including the risk of personally identifiable information exposure. |
|
- Large-scale harm: Tests for "dangerous capabilities," such as chemical, |
|
biological, radiological, and nuclear (CBRN) risks. |
|
|
|
## Usage and Limitations |
|
|
|
These models have certain limitations that users should be aware of. |
|
|
|
### Intended Usage |
|
|
|
Open Large Language Models (LLMs) have a wide range of applications across |
|
various industries and domains. The following list of potential uses is not |
|
comprehensive. The purpose of this list is to provide contextual information |
|
about the possible use-cases that the model creators considered as part of model |
|
training and development. |
|
|
|
- Content Creation and Communication |
|
- Text Generation: These models can be used to generate creative |
|
text formats such as poems, scripts, code, marketing copy, and email drafts. |
|
- Chatbots and Conversational AI: Power conversational interfaces |
|
for customer service, virtual assistants, or interactive applications. |
|
- Text Summarization: Generate concise summaries of a text corpus, |
|
research papers, or reports. |
|
- Research and Education |
|
- Natural Language Processing (NLP) Research: These models can |
|
serve as a foundation for researchers to experiment with NLP |
|
techniques, develop algorithms, and contribute to the advancement of the field. |
|
- Language Learning Tools: Support interactive language learning |
|
experiences, aiding in grammar correction or providing writing practice. |
|
- Knowledge Exploration: Assist researchers in exploring large |
|
bodies of text by generating summaries or answering questions about |
|
specific topics. |
|
|
|
### Limitations |
|
|
|
- Training Data |
|
- The quality and diversity of the training data significantly |
|
influence the model's capabilities. Biases or gaps in the training data |
|
can lead to limitations in the model's responses. |
|
- The scope of the training dataset determines the subject areas |
|
the model can handle effectively. |
|
- Context and Task Complexity |
|
- LLMs are better at tasks that can be framed with clear prompts |
|
and instructions. Open-ended or highly complex tasks might be challenging. |
|
- A model's performance can be influenced by the amount of context |
|
provided (longer context generally leads to better outputs, up to a |
|
certain point). |
|
- Language Ambiguity and Nuance |
|
- Natural language is inherently complex. LLMs might struggle to |
|
grasp subtle nuances, sarcasm, or figurative language. |
|
- Factual Accuracy |
|
- LLMs generate responses based on information they learned from |
|
their training datasets, but they are not knowledge bases. They may |
|
generate incorrect or outdated factual statements. |
|
- Common Sense |
|
- LLMs rely on statistical patterns in language. They might lack |
|
the ability to apply common sense reasoning in certain situations. |
|
|
|
### Ethical Considerations and Risks |
|
|
|
The development of large language models (LLMs) raises several ethical |
|
concerns. In creating an open model, we have carefully considered the |
|
following: |
|
|
|
- Bias and Fairness |
|
- LLMs trained on large-scale, real-world text data can reflect |
|
socio-cultural biases embedded in the training material. These models |
|
underwent careful scrutiny, input data pre-processing described and |
|
posterior evaluations reported in this card. |
|
- Misinformation and Misuse |
|
- LLMs can be misused to generate text that is false, misleading, |
|
or harmful. |
|
- Guidelines are provided for responsible use with the model, see |
|
the [Responsible Generative AI Toolkit](https://ai.google.dev/responsible). |
|
- Transparency and Accountability: |
|
- This model card summarizes details on the models' architecture, |
|
capabilities, limitations, and evaluation processes. |
|
- A responsibly developed open model offers the opportunity to |
|
share innovation by making LLM technology accessible to developers and |
|
researchers across the AI ecosystem. |
|
|
|
Risks identified and mitigations: |
|
|
|
- Perpetuation of biases: It's encouraged to perform continuous |
|
monitoring (using evaluation metrics, human review) and the exploration of |
|
de-biasing techniques during model training, fine-tuning, and other use cases. |
|
- Generation of harmful content: Mechanisms and guidelines for content |
|
safety are essential. Developers are encouraged to exercise caution and |
|
implement appropriate content safety safeguards based on their specific |
|
product policies and application use cases. |
|
- Misuse for malicious purposes: Technical limitations and developer and |
|
end-user education can help mitigate against malicious applications of |
|
LLMs. Educational resources and reporting mechanisms for users to flag |
|
misuse are provided. Prohibited uses of Gemma models are outlined in the |
|
[Gemma Prohibited Use Policy](https://ai.google.dev/gemma/prohibited_use_policy). |
|
- Privacy violations: Models were trained on data filtered for removal of |
|
PII (Personally Identifiable Information). Developers are encouraged to |
|
adhere to privacy regulations with privacy-preserving techniques. |
|
|
|
### Benefits |
|
|
|
At the time of release, this family of models provides high-performance open |
|
large language model implementations designed from the ground up for Responsible |
|
AI development compared to similarly sized models. |
|
|
|
|