allenai
/

Molmo-7B-O-0924

Image-Text-to-Text

text-generation

Model card Files Files and versions Community

chrisc36 commited on Sep 30

Commit

35f1df0

•

1 Parent(s): b14fbed

Update README.md

Files changed (1) hide show

README.md +6 -0

README.md CHANGED Viewed

@@ -98,16 +98,20 @@ print(generated_text)
 To make inference more efficient, run with autocast:
 with torch.autocast(device_type="cuda", enabled=True, dtype=torch.bfloat16):
   output = model.generate_from_batch(
       inputs,
       GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
       tokenizer=processor.tokenizer
   )
 We did most of our evaluations in this setting (autocast on, but float32 weights)
 To even further reduce the memory requirements, the model can be run with bfloat16 weights:
 model.to(dtype=torch.bfloat16)
 inputs["images"] = inputs["images"].to(torch.bfloat16)
 output = model.generate_from_batch(
@@ -115,6 +119,8 @@ output = model.generate_from_batch(
     GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
     tokenizer=processor.tokenizer
 )
 Note that this can sometimes change the output of the model compared to running with float32 weights.

 To make inference more efficient, run with autocast:
+```python
 with torch.autocast(device_type="cuda", enabled=True, dtype=torch.bfloat16):
   output = model.generate_from_batch(
       inputs,
       GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
       tokenizer=processor.tokenizer
   )
+```
 We did most of our evaluations in this setting (autocast on, but float32 weights)
 To even further reduce the memory requirements, the model can be run with bfloat16 weights:
+```python
 model.to(dtype=torch.bfloat16)
 inputs["images"] = inputs["images"].to(torch.bfloat16)
 output = model.generate_from_batch(
     GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
     tokenizer=processor.tokenizer
 )
+```
 Note that this can sometimes change the output of the model compared to running with float32 weights.