Update README.md

ca52788 verified 15 days ago

5.56 kB

	---
	license: creativeml-openrail-m
	datasets:
	- avaliev/umls
	language:
	- en
	base_model:
	- Qwen/Qwen2.5-7B-Instruct
	pipeline_tag: text-generation
	library_name: transformers
	tags:
	- safetensors
	- Unified Medical Language System
	- Qwen2.5
	- 7B
	- Instruct
	- Medical
	- text-generation-inference
	- National Library of Medicine
	---


	### Qwen-UMLS-7B-Instruct `[ Unified Medical Language System ]`

	The Qwen-UMLS-7B-Instruct model is a specialized, instruction-tuned language model designed for medical and healthcare-related tasks. It is fine-tuned on the Qwen2.5-7B-Instruct base model using the UMLS (Unified Medical Language System) dataset, making it an invaluable tool for medical professionals, researchers, and developers building healthcare applications.

	\| File Name \| Size \| Description \| Upload Status \|
	\|-----------------------------------------\|----------------\|-------------------------------------------------\|--------------------\|
	\| `.gitattributes` \| 1.57 kB \| File to specify LFS rules for large file tracking. \| Uploaded \|
	\| `README.md` \| 323 Bytes \| Basic project information file. \| Updated \|
	\| `added_tokens.json` \| 657 Bytes \| Contains additional tokens for the tokenizer. \| Uploaded \|
	\| `config.json` \| 860 Bytes \| Configuration file for the model. \| Uploaded \|
	\| `generation_config.json` \| 281 Bytes \| Configuration file for generation settings. \| Uploaded \|
	\| `merges.txt` \| 1.82 MB \| Byte-pair encoding merge rules for tokenization.\| Uploaded \|
	\| `pytorch_model-00001-of-00004.bin` \| 4.88 GB \| First part of the model's PyTorch checkpoint. \| Uploaded (LFS) \|
	\| `pytorch_model-00002-of-00004.bin` \| 4.93 GB \| Second part of the model's PyTorch checkpoint. \| Uploaded (LFS) \|
	\| `pytorch_model-00003-of-00004.bin` \| 4.33 GB \| Third part of the model's PyTorch checkpoint. \| Uploaded (LFS) \|
	\| `pytorch_model-00004-of-00004.bin` \| 1.09 GB \| Fourth part of the model's PyTorch checkpoint. \| Uploaded (LFS) \|
	\| `pytorch_model.bin.index.json` \| 28.1 kB \| Index file mapping layers to checkpoint shards. \| Uploaded \|
	\| `special_tokens_map.json` \| 644 Bytes \| Maps special tokens like `[CLS]`, `[SEP]`, etc. \| Uploaded \|
	\| `tokenizer.json` \| 11.4 MB \| Tokenizer definition and configuration. \| Uploaded (LFS) \|
	\| `tokenizer_config.json` \| 7.73 kB \| Configuration file for the tokenizer. \| Uploaded \|
	\| `vocab.json` \| 2.78 MB \| Vocabulary file for tokenization. \| Uploaded \|

	### Key Features:

	1. Medical Expertise:
	- Trained on the UMLS dataset, ensuring deep domain knowledge in medical terminology, diagnostics, and treatment plans.

	2. Instruction-Following:
	- Designed to handle complex queries with clarity and precision, suitable for diagnostic support, patient education, and research.

	3. High-Parameter Model:
	- Leverages 7 billion parameters to deliver detailed, contextually accurate responses.

	---

	### Training Details:

	- Base Model: [Qwen2.5-7B-Instruct](#)
	- Dataset: [avaliev/UMLS](#)
	- Comprehensive dataset of medical terminologies, relationships, and use cases with 99.1k samples.
	---
	### Capabilities:

	1. Clinical Text Analysis:
	- Interpret medical notes, prescriptions, and research articles.

	2. Question-Answering:
	- Answer medical queries, provide explanations for symptoms, and suggest treatments based on user prompts.

	3. Educational Support:
	- Assist in learning medical terminologies and understanding complex concepts.

	4. Healthcare Applications:
	- Integrate into clinical decision-support systems or patient care applications.
	---
	### Usage Instructions:

	1. Setup:
	Download all files and ensure compatibility with the Hugging Face Transformers library.

	2. Loading the Model:
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "prithivMLmods/Qwen-UMLS-7B-Instruct"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name)
	```

	3. Generate Medical Text:
	```python
	input_text = "What are the symptoms and treatments for diabetes?"
	inputs = tokenizer(input_text, return_tensors="pt")
	outputs = model.generate(**inputs, max_length=200, temperature=0.7)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	4. Customizing Outputs:
	Modify `generation_config.json` to optimize output style:
	- `temperature` for creativity vs. determinism.
	- `max_length` for concise or extended responses.

	---

	### Applications:

	1. Clinical Support:
	- Assist healthcare providers with quick, accurate information retrieval.

	2. Patient Education:
	- Provide patients with understandable explanations of medical conditions.

	3. Medical Research:
	- Summarize or analyze complex medical research papers.

	4. AI-Driven Diagnostics:
	- Integrate with diagnostic systems for preliminary assessments.

	---