MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation
Abstract
Large Language Models (LLMs), known for their versatility in textual data, are increasingly being explored for their potential to enhance medical image segmentation, a crucial task for accurate diagnostic imaging. This study explores enhancing Vision Transformers (ViTs) for medical image segmentation by integrating pre-trained LLM transformer blocks. Our approach, which incorporates a frozen LLM transformer block into the encoder of a ViT-based model, leads to substantial improvements in segmentation performance across various medical imaging modalities. We propose a Hybrid Attention Mechanism that combines global and local feature learning with a Multi-Scale Fusion Block for aggregating features across different scales. The enhanced model shows significant performance gains, including an average Dice score increase from 0.74 to 0.79 and improvements in accuracy, precision, and the Jaccard Index. These results demonstrate the effectiveness of LLM-based transformers in refining medical image segmentation, highlighting their potential to significantly boost model accuracy and robustness. The source code and our implementation are available at: https://bit.ly/3zf2CVs
Community
The paper presents a novel method of enhancing Vision Transformer (ViT)-based medical image segmentation models by integrating pre-trained frozen transformer blocks from Large Language Models (LLMs), significantly improving segmentation performance across various medical imaging modalities.
- Frozen LLM Transformer Integration: Introduces a pre-trained, frozen transformer block from LLMs into the encoder of a ViT model, resulting in substantial performance improvements in medical image segmentation.
- Hybrid Attention and Multi-Scale Fusion: Proposes a Hybrid Attention Mechanism combining global and local feature learning, alongside a Multi-Scale Fusion Block to aggregate features across scales, enhancing segmentation precision.
- Extensive Evaluation: Demonstrates effectiveness across 10 medical imaging modalities, achieving higher accuracy, precision, and Dice scores, with thorough ablation studies confirming the advantages of the LLM-based approach.
Hi @amanchadha congrats on this work!
Are you planning to share the pre-trained model on the hub? See here for a guide: https://huggingface.co/docs/hub/models-uploading.
Also, would be great to link it to this paper, by including https://huggingface.co/papers/2410.02458 in the model card.
Let us know if you need any help!
Cheers,
Niels
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- SAM-UNet:Enhancing Zero-Shot Segmentation of SAM for Universal Medical Images (2024)
- LSMS: Language-guided Scale-aware MedSegmentor for Medical Image Referring Segmentation (2024)
- ASSNet: Adaptive Semantic Segmentation Network for Microtumors and Multi-Organ Segmentation (2024)
- EM-Net: Efficient Channel and Frequency Learning with Mamba for 3D Medical Image Segmentation (2024)
- MambaClinix: Hierarchical Gated Convolution and Mamba-Based U-Net for Enhanced 3D Medical Image Segmentation (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper