README.md · sethuiyer/Medichat-Llama3-8B at 7b7aef800a8716f61436063f6a101b1e623110af

metadata

base_model:
  - Undi95/Llama-3-Unholy-8B
  - Locutusque/llama-3-neural-chat-v1-8b
  - ruslanmv/Medical-Llama3-8B-16bit
library_name: transformers
tags:
  - mergekit
  - merge
license: other
datasets:
  - mlabonne/orpo-dpo-mix-40k
  - Open-Orca/SlimOrca-Dedup
  - jondurbin/airoboros-3.2
  - microsoft/orca-math-word-problems-200k
  - m-a-p/Code-Feedback
  - MaziyarPanahi/WizardLM_evol_instruct_V2_196k
  - ruslanmv/ai-medical-chatbot

Medichat-Llama3-8B

The following YAML configuration was used to produce this model:


models:
  - model: Undi95/Llama-3-Unholy-8B
    parameters:
      weight: [0.25, 0.35, 0.45, 0.35, 0.25]
      density: [0.1, 0.25, 0.5, 0.25, 0.1]
  - model: Locutusque/llama-3-neural-chat-v1-8b
  - model: ruslanmv/Medical-Llama3-8B-16bit
    parameters:
      weight: [0.55, 0.45, 0.35, 0.45, 0.55]
      density: [0.1, 0.25, 0.5, 0.25, 0.1]
merge_method: dare_ties
base_model: Locutusque/llama-3-neural-chat-v1-8b
parameters:
  int8_mask: true
dtype: bfloat16

Usage:

from transformers import AutoTokenizer, AutoModelForCausalLM

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("sethuiyer/Medichat-Llama3-8B")
model = AutoModelForCausalLM.from_pretrained("sethuiyer/Medichat-Llama3-8B").to("cuda")

# Function to format and generate response with prompt engineering using a chat template
def askme(question):
    sys_message = ''' 
    You are an AI Medical Assistant trained on a vast dataset of health information. Please be thorough and
    provide an informative answer. If you don't know the answer to a specific medical inquiry, advise seeking professional help.
    '''

    # Create messages structured for the chat template
    messages = [{"role": "system", "content": sys_message}, {"role": "user", "content": question}]

    # Applying chat template
    prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    outputs = model.generate(**inputs, max_new_tokens=512, use_cache=True)  # Adjust max_new_tokens for longer responses

    # Extract and return the generated text
    answer = tokenizer.batch_decode(outputs)[0].strip()
    return answer

# Example usage
question = '''
Symptoms:
Dizziness, headache and nausea.

What is the differnetial diagnosis?
'''
print(askme(question))