OASISCoder-Llama-3.2-3B
Model Description
OASISCoder-Llama-3.2-3B is a fine-tuned version of the LLaMA 3.2 3B model, trained specifically on a medical coding dataset focusing on CMS (Centers for Medicare & Medicaid Services) and OASIS (Outcome and Assessment Information Set) standards. The model is designed to assist healthcare professionals and organizations in generating accurate medical codes and streamlining documentation tasks. It supports medical question-answering, medical coding, and clinical decision support, with an emphasis on regulatory compliance and documentation quality in the US healthcare system.
Intended Use Cases
- Medical Coding (CMS, OASIS): Supports automated or semi-automated coding tasks in clinical documentation, reducing administrative burden for healthcare providers.
- Clinical Decision Support: Provides relevant, context-aware answers based on healthcare standards and medical queries.
- Healthcare QA Systems: Useful for building medical chatbots and virtual assistants that handle queries related to CMS regulations, OASIS standards, and healthcare procedures.
- Medical Compliance: Ensures accurate documentation for home healthcare assessments and improves regulatory compliance in clinical settings.
Training Data
The model was fine-tuned on a comprehensive medical coding dataset integrating CMS and OASIS data, including real-world clinical documentation and coding tasks. The dataset includes examples of medical diagnoses, procedures, patient assessments, and coding annotations following CMS and OASIS regulations.
Architecture
The model is based on LLaMA 3.2 3B, a powerful large language model architecture optimized for language understanding and generation tasks. Fine-tuning on the medical domain allows it to provide highly specialized and accurate outputs for healthcare tasks.
Performance
- Accuracy: The model demonstrates a high accuracy rate in generating CMS and OASIS codes from clinical text and answering medical queries.
- Efficiency: Fine-tuning on specific healthcare tasks has reduced the model's processing time for medical coding and decision support tasks.
Limitations
- Not a Diagnostic Tool: This model is not intended for making medical diagnoses and should not be used as a replacement for professional medical judgment.
- Bias and Data Coverage: The model's performance is best on US healthcare data (CMS, OASIS), and may not generalize well to other healthcare systems or international coding standards.
License
The model is released under the Apache License 2.0, making it available for non-commercial research and development purposes.
How to Use
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("exafluence/OASISCoder-Llama-3.2-3B")
model = AutoModelForCausalLM.from_pretrained("exafluence/OASISCoder-Llama-3.2-3B")
input_text = "What is the CMS code for a patient with diabetes?"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs)
If you use this model, please cite:
@inproceedings{exafluence2024OASISCoder,
title={OASISCoder-Llama-3.2-3B: A Medical Coding Language Model for CMS and OASIS},
author={Exafluence Inc.},
year={2024},
doi={10.57967/hf/3260},
url={https://huggingface.co/exafluence/OASISCoder-Llama-3.2-3B}
}
Uploaded model
- Developed by: exafluence
- License: apache-2.0
- Finetuned from model : unsloth/llama-3.2-3b-instruct-bnb-4bit
This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.