File size: 4,601 Bytes
6a2e79b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d290a9c
 
6a2e79b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
# ClinicalGPT-Pubmed-Instruct-V1.0

## Overview
ClinicalGPT-Pubmed-Instruct-V1.0 is a specialized language model fine-tuned on the mistralai/Mistral-7B-Instruct-v0.2 base model. While primarily trained on 10 million PubMed abstracts and titles, this model excels at generating responses to life science-related medical questions with relevant citations from various scientific sources.

## Key Features
- Built on Mistral-7B-Instruct-v0.2 base model
- Primary training on 10M PubMed abstracts and titles
- Generates answers with scientific citations from multiple sources
- Specialized for medical and life science domains

## Applications
- **Life Science Research**: Generate accurate, referenced answers for biomedical and healthcare queries
- **Pharmaceutical Industry**: Support healthcare professionals with evidence-based responses
- **Medical Education**: Aid students and educators with scientifically-supported content from various academic sources

## System Requirements

### GPU Requirements
- **Minimum VRAM**: 16-18 GB for inference in BF16 (BFloat16) precision
- **Recommended GPUs**:
  - NVIDIA A100 (20GB) - Ideal for BF16 precision
  - Any GPU with 16+ GB VRAM
  - Performance may vary based on available memory

### Software Prerequisites
- Python 3.x
- PyTorch
- Transformers library

### Basic Implementation
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Set parameters
model_dir = "rohitanurag/ClinicalGPT-Pubmed-Instruct-V1.0"
max_new_tokens = 1500
device = "cuda" if torch.cuda.is_available() else "cpu"

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForCausalLM.from_pretrained(model_dir).to(device)

# Define your question
question = "What is the role of the tumor microenvironment in cancer progression?"
prompt = f"""Please provide the answer to the question asked.
### Question: {question}
### Answer: """

# Tokenize input
inputs = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True).to(device)

# Generate output
output_ids = model.generate(
    inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1000,
    repetition_penalty=1.2,
    pad_token_id=tokenizer.eos_token_id,
)

# Decode and print
generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(f"Generated Answer:\n{generated_text}")
```

## Sample Output
```
### Question: What is the role of the tumor microenvironment in cancer progression, and how does it influence the response to therapy?
### Answer:
The tumor microenvironment (TME) refers to the complex network of cells, extracellular matrix components, signaling molecules, and immune cells that surround a growing tumor. It plays an essential role in regulating various aspects of cancer development and progression...

### References:
1. Hanahan D, Weinberg RA. Hallmarks of Cancer: The Next Generation. Cell. 2011;144(5):646-74. doi:10.1016/j.cell.2011.03.019
2. Coussens LM, Pollard JW. Angiogenesis and Metastasis. Nature Reviews Cancer. 2006;6(1):57-68. doi:10.1038/nrc2210
3. Mantovani A, et al. Cancer's Educated Environment: How the Tumour Microenvironment Promotes Progression. Cell. 2017;168(6):988-1001.e15. doi:10.1016/j.cell.2017.02.011
4. Cheng YH, et al. Targeting the Tumor Microenvironment for Improved Therapy Response. Journal of Clinical Oncology. 2018;34(18_suppl):LBA10001. doi:10.1200/JCO.2018.34.18_suppl.LBA10001
5. Kang YS, et al. Role of the Tumor Microenvironment in Cancer Immunotherapy. Current Opinion in Pharmacology. 2018;30:101-108. doi:10.1016/j.ycoop.20
```

## Model Details
- **Base Model**: Mistral-7B-Instruct-v0.2
- **Primary Training Data**: 10 million PubMed abstracts and titles
- **Specialization**: Medical question-answering with scientific citations
- **Output**: Generates detailed answers with relevant academic references

## Future Development
ClinicalGPT-Pubmed-Instruct-V2.0 is under development, featuring:
- Training on 20 million scientific articles
- Inclusion of full-text articles from various academic sources
- Enhanced performance for life science tasks
- Expanded citation capabilities across multiple scientific databases

## Contributors
- Rohit Anurag – Principal Data Scientist
- Aneesh Paul – Data Scientist

## License
Licensed under the Apache License, Version 2.0. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

## Citation
If you use this model in your research, please cite it appropriately.

## Support
For issues and feature requests, please use the GitHub issue tracker.