Mozilla
/

gemma-2-27b-it-llamafile

Inference Endpoints

Model card Files Files and versions Community

jartine commited on Jul 2

Commit

7c55c5b

•

1 Parent(s): c6d6aba

Update README.md

Files changed (1) hide show

README.md +12 -6

README.md CHANGED Viewed

@@ -24,7 +24,7 @@ Gemma v2 is a large language model released by Google on Jun 27th 2024.
 - Original model: [google/gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it)
 The model is packaged into executable weights, which we call
-[llamafiles](https://github.com/Mozilla-Ocho/llamafile)). This makes it
 easy to use the model on Linux, MacOS, Windows, FreeBSD, OpenBSD, and
 NetBSD for AMD64 and ARM64.
@@ -75,11 +75,9 @@ of the README.
 When using the browser GUI, you need to fill out the following fields.
-Prompt template:
 ```
-<start_of_turn>system
-{{prompt}}<end_of_turn>
 {{history}}
 <start_of_turn>{{char}}
 ```
@@ -100,6 +98,12 @@ The Belobog Academy has discovered a new, invasive species of algae that can dou
 '
 ```
 ## About llamafile
 llamafile is a new format introduced by Mozilla Ocho on Nov 20th 2023.
@@ -110,8 +114,10 @@ AMD64.
 ## About Quantization Formats
 This model works should work well with any quantization format. Q6\_K is
-the best choice overall here. But since this is a Google model, the
-Google Brain floating point format (BF16) provides maximum quality.
 ---

 - Original model: [google/gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it)
 The model is packaged into executable weights, which we call
+[llamafiles](https://github.com/Mozilla-Ocho/llamafile). This makes it
 easy to use the model on Linux, MacOS, Windows, FreeBSD, OpenBSD, and
 NetBSD for AMD64 and ARM64.
 When using the browser GUI, you need to fill out the following fields.
+Prompt template (note: this is for chat; Gemma doesn't have a system role):
 ```
 {{history}}
 <start_of_turn>{{char}}
 ```
 '
 ```
+## About Upload Limits
+Files which exceed the Hugging Face 50GB upload limit have a .cat𝑋
+extension. You need to use the `cat` command locally to turn them back
+into a single file, using the same order.
 ## About llamafile
 llamafile is a new format introduced by Mozilla Ocho on Nov 20th 2023.
 ## About Quantization Formats
 This model works should work well with any quantization format. Q6\_K is
+the best choice overall. We tested that it's able to produce identical
+responses to the Gemma2 27B model that's hosted by Google themselves on
+aistudio.google.com. If you encounter any divergences, then try using
+the BF16 weights, which have the original fidelity.
 ---