lamm-mit
/

x-lora-gemma-7b

Transformers

Safetensors

Inference Endpoints

Model card Files Files and versions Community

mjbuehler commited on Apr 11, 2024

Commit

245e2c8

verified ·

1 Parent(s): c1dea12

Update README.md

Browse files

Files changed (1) hide show

README.md +61 -3

README.md CHANGED Viewed

@@ -3,12 +3,68 @@ library_name: transformers
 tags: []
 ---
-# Model Card for Model ID
 <!-- Provide a quick summary of what the model is/does. -->
 ## Model Details
 ### Model Description
@@ -37,6 +93,8 @@ This is the model card of a 🤗 transformers model that has been pushed on the
 <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 ### Direct Use
 <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

 tags: []
 ---
+# Model Card for X-LoRA-Gemma-7b
 <!-- Provide a quick summary of what the model is/does. -->
+```
+import torch
+from xlora.xlora_utils import load_model
+XLoRa_model_name = 'lamm-mit/x-lora-gemma-7b'
+model,tokenizer=load_model(model_name = XLoRa_model_name,
+                           device='cuda:0',
+                           use_flash_attention_2=True,
+                           dtype=torch.bfloat16,
+                            )
+```
+```
+def generate_XLoRA_Gemma (system_prompt='You a helpful assistant. You are familiar with materials science. ',
+                     prompt='What is spider silk in the context of bioinspired materials?',
+                     repetition_penalty=1.,num_beams=1,num_return_sequences=1,
+                     top_p=0.9, top_k=256, temperature=.5,max_new_tokens=512, verbatim=False, eos_token=None,
+                     add_special_tokens=True, prepend_response='',
+                         ):
+    if eos_token==None:
+        eos_token= tokenizer.eos_token_id
+    if system_prompt==None:
+        messages=[ {"role": "user", "content": prompt},  ]
+    else:
+        messages=[ {"role": "user", "content": system_prompt+prompt},  ]
+    txt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, )
+    txt=txt+prepend_response
+    inputs = tokenizer(txt, add_special_tokens  =add_special_tokens, return_tensors ='pt').to(device)
+    with torch.no_grad():
+          outputs = model.generate(input_ids = inputs["input_ids"],
+                                   attention_mask = inputs["attention_mask"] , # This is usually done automatically by the tokenizer
+                                   max_new_tokens=max_new_tokens,
+                                   temperature=temperature, #value used to modulate the next token probabilities.
+                                   num_beams=num_beams,
+                                   top_k = top_k,
+                                   top_p = top_p,
+                                   num_return_sequences = num_return_sequences,
+                                   eos_token_id=eos_token,
+                                   pad_token_id = eos_token,
+                                   do_sample =True,#skip_prompt=True,
+                                   repetition_penalty=repetition_penalty,
+                                   )
+    return tokenizer.batch_decode(outputs[:,inputs["input_ids"].shape[1]:].detach().cpu().numpy(), skip_special_tokens=True)
+```
+Then, use as follows:
+```
+from IPython.display import display, Markdown
+q='''What is graphene?'''
+res=generate_XLoRA_Gemma( system_prompt='You design materials.',
+         prompt=q, max_new_tokens=1024, temperature=0.3,  )
+display (Markdown(res))
+```
 ## Model Details
 ### Model Description
 <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 ### Direct Use
 <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->