halyn/gemma2-2b-it-finetuned-paperqa

README for Gemma-2-2B-IT Fine-Tuning with LoRA

This project fine-tunes the Gemma-2-2B-IT model using LoRA (Low-Rank Adaptation) for Question Answering tasks, leveraging the Wikitext-2 dataset. The fine-tuning process is optimized for efficient training on limited GPU memory by freezing most model parameters and applying LoRA to specific layers.

Project Overview

Model: Gemma-2-2B-IT, a causal language model.
Dataset: Wikitext-2 for text generation and causal language modeling.
Training Strategy: LoRA adaptation for low-resource fine-tuning.
Frameworks: Hugging Face transformers, peft, and datasets.

Key Features

LoRA Configuration:
- LoRA is applied to the following projection layers: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, and down_proj.
- LoRA hyperparameters:
  - Rank (r): 4
  - LoRA Alpha: 8
  - Dropout: 0.1
Training Configuration:
- Mixed precision (fp16) enabled for faster and more memory-efficient training.
- Gradient accumulation with 32 steps to manage large model sizes on small GPUs.
- Batch size of 1 due to GPU memory constraints.
- Learning rate: 5e-5 with weight decay: 0.01.

System Requirements

GPU: Required for efficient training. This script was tested with CUDA-enabled GPUs.
Python Packages: Install dependencies with:
```
pip install -r requirements.txt
```

Notes

This fine-tuned model leverages LoRA to adapt the large Gemma-2-2B-IT model with minimal trainable parameters, allowing fine-tuning even on hardware with limited memory.
The fine-tuned model can be further utilized for tasks like Question Answering, and it is optimized for resource-efficient deployment.

Memory Usage

The training script includes CUDA memory summaries before and after the training process to monitor GPU memory consumption.