metadata
license: cc-by-nc-4.0
datasets:
- starmpcc/Asclepius-Synthetic-Clinical-Notes
language:
- en
pipeline_tag: text2text-generation
tags:
- medical
Model Card for Model ID
This is official model checkpoint for Asclepius-7B arxiv This model is the first publicly shareable clinical LLM, trained with synthetic data.
Model Details
Model Description
- Model type: Clinical LLM (Large Language Model)
- Language(s) (NLP): English
- License: CC-BY-NC-SA 4.0
- Finetuned from model [optional]: LLaMA-7B
Model Sources [optional]
- Repository: https://github.com/starmpcc/Asclepius
- Paper [optional]: TODO Arxiv
- Data: https://huggingface.co/datasets/starmpcc/Asclepius-Synthetic-Clinical-Notes
Uses
This model can perform below 8 clinical NLP tasks, with clincal notes.
- Named Entity Recognition
- Abbreviation Expansion
- Relation Extraction
- Temporal Information Extraction
- Coreference Resolution
- Paraphrasing
- Summarization
- Question Answering
Direct Use
[More Information Needed]
Downstream Use [optional]
[More Information Needed]
Out-of-Scope Use
ONLY USE THIS MODEL FOR RESEARCH PURPOSE!!
How to Get Started with the Model
prompt = """You are an intelligent clinical languge model.
Below is a snippet of patient's discharge summary and a following instruction from healthcare professional.
Write a response that appropriately completes the instruction.
The response should provide the accurate answer to the instruction, while being concise.
[Discharge Summary Begin]
{note}
[Discharge Summary End]
[Instruction Begin]
{question}
[Instruction End]
"""
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("starmpcc/Asclepius-7B")
model = AutoModel.from_pretrained("starmpcc/Asclepius-7B")
note = "This is a sample note"
question = "What is the diagnosis?"
model_input = prompt.format(note=note, question=question)
input_ids = tokenizer(model_input, return_tensors="pt").input_ids
output = model.generate(input_ids)
print(tokenizer.decode(output[0]))
Training Details
Training Data
https://huggingface.co/datasets/starmpcc/Asclepius-Synthetic-Clinical-Notes
Training Procedure
- Initial training was conducted using causal language modeling on synthetic clinical notes.
- It was then fine-tuned with clinical instruction-response pairs.
- For a comprehensive overview of our methods, our upcoming paper will serve as a resource.
Training Hyperparameters
- We followed config used in Stanford Alpaca
Speeds, Sizes, Times [optional]
- Pre-Training (1 epoch): 1h 33m with 8x A100 80G
- Instruction Fine-Tuning (3 epoch): 7h 26m with 8x A100 80G
Citation [optional]
BibTeX:
[More Information Needed]
APA:
[More Information Needed]