Update README.md (#19)

Browse files

- Update README.md (15f395c688792d3140dbf5ced0857afaf65aff03)
- Update README.md (00e1acdb665ceabffba3830e4f54a4bd99fd4014)

Co-authored-by: Vaibhav Srivastav <reach-vb@users.noreply.huggingface.co>

Files changed (1) hide show

README.md +1 -42

README.md CHANGED Viewed

@@ -73,51 +73,10 @@ print(tokenizer.decode(outputs[0]))
 <a name="precisions"></a>
 #### Running the model on a GPU using different precisions
-The native weights of this model were exported in `bfloat16` precision. You can use `float16`, which may be faster on certain hardware, indicating the `torch_dtype` when loading the model. For convenience, the `float16` revision of the repo contains a copy of the weights already converted to that precision.
 You can also use `float32` if you skip the dtype, but no precision increase will occur (model weights will just be upcasted to `float32`). See examples below.
-* _Using `torch.float16`_
-```python
-# pip install accelerate
-from transformers import AutoTokenizer, AutoModelForCausalLM
-import torch
-tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-9b-it")
-model = AutoModelForCausalLM.from_pretrained(
-    "google/gemma-2-9b-it",
-    device_map="auto",
-    torch_dtype=torch.float16,
-    revision="float16",
-)
-input_text = "Write me a poem about Machine Learning."
-input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
-outputs = model.generate(**input_ids)
-print(tokenizer.decode(outputs[0]))
-```
-* _Using `torch.bfloat16`_
-```python
-# pip install accelerate
-from transformers import AutoTokenizer, AutoModelForCausalLM
-tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-9b-it")
-model = AutoModelForCausalLM.from_pretrained(
-    "google/gemma-2-9b-it",
-    device_map="auto",
-    torch_dtype=torch.bfloat16)
-input_text = "Write me a poem about Machine Learning."
-input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
-outputs = model.generate(**input_ids)
-print(tokenizer.decode(outputs[0]))
-```
 * _Upcasting to `torch.float32`_
 ```python

 <a name="precisions"></a>
 #### Running the model on a GPU using different precisions
+The native weights of this model were exported in `bfloat16` precision.
 You can also use `float32` if you skip the dtype, but no precision increase will occur (model weights will just be upcasted to `float32`). See examples below.
 * _Upcasting to `torch.float32`_
 ```python