File size: 5,562 Bytes
38f981b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53454c9
48aa444
43fb3b1
853f720
 
 
ca52788
853f720
329e77d
 
853f720
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
329e77d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
853f720
 
329e77d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
---
license: creativeml-openrail-m
datasets:
- avaliev/umls
language:
- en
base_model:
- Qwen/Qwen2.5-7B-Instruct
pipeline_tag: text-generation
library_name: transformers
tags:
- safetensors
- Unified Medical Language System
- Qwen2.5
- 7B
- Instruct
- Medical
- text-generation-inference
- National Library of Medicine
- umls
---


### Qwen-UMLS-7B-Instruct `[ Unified Medical Language System ]`

The **Qwen-UMLS-7B-Instruct** model is a specialized, instruction-tuned language model designed for medical and healthcare-related tasks. It is fine-tuned on the **Qwen2.5-7B-Instruct** base model using the **UMLS (Unified Medical Language System)** dataset, making it an invaluable tool for medical professionals, researchers, and developers building healthcare applications.

| **File Name**                          | **Size**       | **Description**                                  | **Upload Status**  |
|-----------------------------------------|----------------|-------------------------------------------------|--------------------|
| `.gitattributes`                        | 1.57 kB        | File to specify LFS rules for large file tracking. | Uploaded           |
| `README.md`                             | 323 Bytes      | Basic project information file.                 | Updated            |
| `added_tokens.json`                     | 657 Bytes      | Contains additional tokens for the tokenizer.   | Uploaded           |
| `config.json`                           | 860 Bytes      | Configuration file for the model.               | Uploaded           |
| `generation_config.json`                | 281 Bytes      | Configuration file for generation settings.     | Uploaded           |
| `merges.txt`                            | 1.82 MB        | Byte-pair encoding merge rules for tokenization.| Uploaded           |
| `pytorch_model-00001-of-00004.bin`      | 4.88 GB        | First part of the model's PyTorch checkpoint.   | Uploaded (LFS)     |
| `pytorch_model-00002-of-00004.bin`      | 4.93 GB        | Second part of the model's PyTorch checkpoint.  | Uploaded (LFS)     |
| `pytorch_model-00003-of-00004.bin`      | 4.33 GB        | Third part of the model's PyTorch checkpoint.   | Uploaded (LFS)     |
| `pytorch_model-00004-of-00004.bin`      | 1.09 GB        | Fourth part of the model's PyTorch checkpoint.  | Uploaded (LFS)     |
| `pytorch_model.bin.index.json`          | 28.1 kB        | Index file mapping layers to checkpoint shards. | Uploaded           |
| `special_tokens_map.json`               | 644 Bytes      | Maps special tokens like `[CLS]`, `[SEP]`, etc. | Uploaded           |
| `tokenizer.json`                        | 11.4 MB        | Tokenizer definition and configuration.         | Uploaded (LFS)     |
| `tokenizer_config.json`                 | 7.73 kB        | Configuration file for the tokenizer.           | Uploaded           |
| `vocab.json`                            | 2.78 MB        | Vocabulary file for tokenization.               | Uploaded           |

### **Key Features:**

1. **Medical Expertise:**  
   - Trained on the UMLS dataset, ensuring deep domain knowledge in medical terminology, diagnostics, and treatment plans.

2. **Instruction-Following:**  
   - Designed to handle complex queries with clarity and precision, suitable for diagnostic support, patient education, and research.

3. **High-Parameter Model:**  
   - Leverages 7 billion parameters to deliver detailed, contextually accurate responses.

---

### **Training Details:**

- **Base Model:** [Qwen2.5-7B-Instruct](#)  
- **Dataset:** [avaliev/UMLS](#)  
  - Comprehensive dataset of medical terminologies, relationships, and use cases with 99.1k samples.
---
### **Capabilities:**

1. **Clinical Text Analysis:**  
   - Interpret medical notes, prescriptions, and research articles.

2. **Question-Answering:**  
   - Answer medical queries, provide explanations for symptoms, and suggest treatments based on user prompts.

3. **Educational Support:**  
   - Assist in learning medical terminologies and understanding complex concepts.

4. **Healthcare Applications:**  
   - Integrate into clinical decision-support systems or patient care applications.
---
### **Usage Instructions:**

1. **Setup:**
   Download all files and ensure compatibility with the Hugging Face Transformers library.

2. **Loading the Model:**
   ```python
   from transformers import AutoModelForCausalLM, AutoTokenizer
   
   model_name = "prithivMLmods/Qwen-UMLS-7B-Instruct"
   tokenizer = AutoTokenizer.from_pretrained(model_name)
   model = AutoModelForCausalLM.from_pretrained(model_name)
   ```

3. **Generate Medical Text:**
   ```python
   input_text = "What are the symptoms and treatments for diabetes?"
   inputs = tokenizer(input_text, return_tensors="pt")
   outputs = model.generate(**inputs, max_length=200, temperature=0.7)
   print(tokenizer.decode(outputs[0], skip_special_tokens=True))
   ```

4. **Customizing Outputs:**
   Modify `generation_config.json` to optimize output style:
   - `temperature` for creativity vs. determinism.
   - `max_length` for concise or extended responses.

---

### **Applications:**

1. **Clinical Support:**  
   - Assist healthcare providers with quick, accurate information retrieval.

2. **Patient Education:**  
   - Provide patients with understandable explanations of medical conditions.

3. **Medical Research:**  
   - Summarize or analyze complex medical research papers.

4. **AI-Driven Diagnostics:**  
   - Integrate with diagnostic systems for preliminary assessments.

---