Update README.md
Browse files
README.md
CHANGED
@@ -98,16 +98,20 @@ print(generated_text)
|
|
98 |
|
99 |
To make inference more efficient, run with autocast:
|
100 |
|
|
|
101 |
with torch.autocast(device_type="cuda", enabled=True, dtype=torch.bfloat16):
|
102 |
output = model.generate_from_batch(
|
103 |
inputs,
|
104 |
GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
|
105 |
tokenizer=processor.tokenizer
|
106 |
)
|
|
|
|
|
107 |
We did most of our evaluations in this setting (autocast on, but float32 weights)
|
108 |
|
109 |
To even further reduce the memory requirements, the model can be run with bfloat16 weights:
|
110 |
|
|
|
111 |
model.to(dtype=torch.bfloat16)
|
112 |
inputs["images"] = inputs["images"].to(torch.bfloat16)
|
113 |
output = model.generate_from_batch(
|
@@ -115,6 +119,8 @@ output = model.generate_from_batch(
|
|
115 |
GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
|
116 |
tokenizer=processor.tokenizer
|
117 |
)
|
|
|
|
|
118 |
Note that this can sometimes change the output of the model compared to running with float32 weights.
|
119 |
|
120 |
|
|
|
98 |
|
99 |
To make inference more efficient, run with autocast:
|
100 |
|
101 |
+
```python
|
102 |
with torch.autocast(device_type="cuda", enabled=True, dtype=torch.bfloat16):
|
103 |
output = model.generate_from_batch(
|
104 |
inputs,
|
105 |
GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
|
106 |
tokenizer=processor.tokenizer
|
107 |
)
|
108 |
+
```
|
109 |
+
|
110 |
We did most of our evaluations in this setting (autocast on, but float32 weights)
|
111 |
|
112 |
To even further reduce the memory requirements, the model can be run with bfloat16 weights:
|
113 |
|
114 |
+
```python
|
115 |
model.to(dtype=torch.bfloat16)
|
116 |
inputs["images"] = inputs["images"].to(torch.bfloat16)
|
117 |
output = model.generate_from_batch(
|
|
|
119 |
GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
|
120 |
tokenizer=processor.tokenizer
|
121 |
)
|
122 |
+
```
|
123 |
+
|
124 |
Note that this can sometimes change the output of the model compared to running with float32 weights.
|
125 |
|
126 |
|