diegobit
/

Phi-3-mini-4k-instruct-ita-orpo-v2

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

Phi-3-mini-4k-instruct-ita-orpo-v2 / README.md

diegobit's picture

Update README.md

b97b8da verified 6 months ago

|

history blame contribute delete

2.11 kB

	---
	library_name: transformers
	tags:
	- unsloth
	license: mit
	datasets:
	- efederici/alpaca-vs-alpaca-orpo-dpo
	---

	# Model Card for Model ID

	This is phi-3-mini-4k-instruct ORPO finetuning for the italian language over the Alpaca vs. Alpaca italian dataset: [efederici/alpaca-vs-alpaca-orpo-dpo](https://huggingface.co/datasets/efederici/alpaca-vs-alpaca-orpo-dpo)

	## Model Details

	### Model Description

	- Developed by: Diego Giorgini
	- Funded by: AI Technologies SRL - www.aitechnologies.it
	- Language(s) (NLP): Italian
	- License: llama3
	- Finetuned from model: unsloth/Phi-3-mini-4k-instruct

	## Training Details

	### Environment

	unsloth: 2024.5
	torch: 2.2

	### Training Data

	[efederici/alpaca-vs-alpaca-orpo-dpo](https://huggingface.co/datasets/efederici/alpaca-vs-alpaca-orpo-dpo): The Alpaca vs. Alpaca dataset is a curated blend of the Alpaca dataset and the Alpaca GPT-4 dataset, both available on HuggingFace Datasets. It uses the standard GPT dataset as the 'rejected' answer, steering the model towards the GPT-4 answer, which is considered as the 'chosen' one.

	### Training Procedure

	#### Preprocessing [optional]

	- No preprocessing has been performed, except for formatting with the phi-3 chat_template from unsloth:

	```tokenizer = get_chat_template(tokenizer, chat_template = "phi-3")```

	#### Training Hyperparameters

	- Training regime: bf16

	- Model loading parameters:

	```
	max_seq_length = 8192
	dtype = None
	load_in_4bit = False
	```

	- PEFT parameters:

	```
	r = 64
	lora_alpha = 64
	lora_dropout = 0
	bias = "none"
	random_state = 3407
	use_rslora = False
	loftq_config = None
	```

	- ORPOConfig parameters:

	```
	max_length = 8192
	max_prompt_length = max_seq_length//2
	max_completion_length = max_seq_length//2
	warmup_ratio = 0.1
	weight_decay = 0.01
	per_device_train_batch_size = 1
	gradient_accumulation_steps = 16
	learning_rate=8e-6
	beta = 0.1
	optim = "paged_adamw_8bit"
	lr_scheduler_type = "linear"
	num_train_epochs = 1
	```

	#### Speeds, Sizes, Times

	7h on an A100-40GB

	## Model Card Contact

	diego.giorgini@icloud.com