File size: 3,097 Bytes
c45d4b9
1299e5a
5d4fa64
 
5e9efdb
 
c45d4b9
 
1299e5a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c45d4b9
1299e5a
c45d4b9
1299e5a
c45d4b9
1299e5a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
---
library_name: peft
base_model:
- unsloth/Llama-3.2-11B-Vision-Instruct
datasets:
- eltorio/ROCOv2-radiology
---

# Model Card for Llama-3.2 11b Vision Medical

<img src="https://i5.walmartimages.com/seo/DolliBu-Beige-Llama-Doctor-Plush-Toy-Super-Soft-Stuffed-Animal-Dress-Up-Cute-Scrub-Uniform-Cap-Outfit-Fluffy-Gift-11-Inches_e78392b2-71ef-4e26-a23f-8bb0b0e2043a.70c3b5988d390cf43d799758a826f2a5.jpeg" alt="drawing" width="400"/>

<font color="FF0000" size="5"><b>
This is a vision-language model fine-tuned for radiographic image analysis</b></font>
<br><b>Foundation Model: https://huggingface.co/unsloth/Llama-3.2-11B-Vision-Instruct<br/>
Dataset: https://huggingface.co/datasets/eltorio/ROCOv2-radiology<br/></b>

The model has been fine-tuned using CUDA-enabled GPU hardware.

## Model Details

The model is based upon the foundation model: unsloth/Llama-3.2-11B-Vision-Instruct.<br/>
It has been tuned with Supervised Fine-tuning Trainer and PEFT LoRA with vision-language capabilities.

### Libraries
- unsloth
- transformers
- torch
- datasets
- trl
- peft

## Bias, Risks, and Limitations

To optimize training efficiency, the model has been trained on a subset of the ROCOv2-radiology dataset (1/7th of the total dataset).<br/>

<font color="FF0000">
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.<br/>
The model's performance is directly dependent on the quality and diversity of the training data. Medical diagnosis should always be performed by qualified healthcare professionals.<br/>
Generation of plausible yet incorrect medical interpretations could occur and should not be used as the sole basis for clinical decisions.
</font>

## Training Details

### Training Parameters
- per_device_train_batch_size = 2
- gradient_accumulation_steps = 16
- num_train_epochs = 3
- learning_rate = 5e-5
- weight_decay = 0.02
- lr_scheduler_type = "linear"
- max_seq_length = 2048

### LoRA Configuration
- r = 32
- lora_alpha = 32
- lora_dropout = 0
- bias = "none"

### Hardware Requirements
The model was trained using CUDA-enabled GPU hardware.

### Training Statistics
- Training duration: 40,989 seconds (approximately 683 minutes)
- Peak reserved memory: 12.8 GB
- Peak reserved memory for training: 3.975 GB
- Peak reserved memory % of max memory: 32.3%
- Peak reserved memory for training % of max memory: 10.1%

### Training Data
The model was trained on the ROCOv2-radiology dataset, which contains radiographic images and their corresponding medical descriptions. .

The training set was reduced to 1/7th of the original size for computational efficiency.

## Usage

The model is designed to provide detailed descriptions of radiographic images. It can be prompted with:
```python
instruction = "You are an expert radiographer. Describe accurately what you see in this image."
```

## Model Access

The model is available on Hugging Face Hub at: bouthros/llma32_11b_vision_medical

## Citation

If you use this model, please cite the original ROCOv2-radiology dataset and the Llama-3.2-11B-Vision-Instruct base model.