LaZeAsh
/

gemma-2b-lahacks

Generated from Trainer

Model card Files Files and versions Community

gemma-2b-lahacks / README.md

LaZeAsh's picture

Update README.md

f876cf6 verified 10 months ago

|

history blame contribute delete

3.21 kB

	---
	license: gemma
	library_name: peft
	tags:
	- trl
	- sft
	- generated_from_trainer
	base_model: google/gemma-2b-it
	model-index:
	- name: gemma-2b-lahacks
	results: []
	---

	# gemma-2b-lahacks 💻

	This model is a fine-tuned version of [google/gemma-2b-it](https://huggingface.co/google/gemma-2b-it).
	It achieves the following results on the evaluation set:
	- Loss: 2.3061

	## Model description 📝

	This model was fine-tuned during LAHacks 2024, the intention of this model is to be able to diagnose a patient appropratiely
	based on the information in their previous medical records, current symptoms, age, sex, and more.

	## Intended uses & limitations ⁉️

	Code inference sample:
	```py
	from peft import PeftModel, PeftConfig
	from transformers import AutoModelForCausalLM, AutoTokenizer

	config = PeftConfig.from_pretrained("LaZeAsh/gemma-2b-lahacks")
	model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it")
	model = PeftModel.from_pretrained(model, "LaZeAsh/gemma-2b-lahacks")

	tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")

	prompt = "I feel cold I most likely have a "

	input_ids = tokenizer.encode(prompt, return_tensors = 'pt')

	output = model.generate(input_ids, max_length=50, num_return_sequences=1)

	generated_text = tokenizer.decode(output[0], skip_special_tokens=True)

	print(generated_text)
	```

	Uses: To use Artificial Intelligence technology to diagnose patient based off of multiple parameters ranging from their age to their
	medical record.

	Limitation: There's a highly likelyhood that the model will NOT be great at diagnosing it's users, the amount of time it took to fine-tune
	this model limited how much data we could train it on. With more time a more accurate model would be expected.

	## Training and evaluation data 📈

	The model was trained on data from the research paper 'A New Dataset For Automatic Medical Diagnosis' by Arsène Fansi Tchango, Rishab Goel,
	Zhi Wen, Julien Martel, Joumana Ghosn. The 'release_train_patients.csv' dataset was reduced from it's original 1.3 million rows of data to a
	mere 500-1000 rows of data. This was due to the time it took to fine-tune a model which depended on how big the dataset provided was.

	## Training procedure 🏋️

	The fine-tuning took MULTIPLE, and I mean MULTIPLE tries. Sometimes the dataset provided was very big so the kernel had to be restarted multiple times.
	Additionally, the model was tuned on the default data that Intel offers in their guide to fine-tune a gemma model.

	### Training hyperparameters 🔍

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 2
	- eval_batch_size: 8
	- seed: 42
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_ratio: 0.05
	- training_steps: 140
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 3.3089 \| 3.5714 \| 100 \| 2.3061 \|


	### Framework versions

	- PEFT 0.10.0
	- Transformers 4.40.0
	- Pytorch 2.0.1a0+cxx11.abi
	- Datasets 2.19.0
	- Tokenizers 0.19.1