llama-pro-8b-tweet-summarization

This model is a fine-tuned version of TencentARC/LLaMA-Pro-8B on the dialogstudio dataset. It achieves the following results on the evaluation set:

Loss: 3.0033
Rouge Scores: {'rouge1': 74.5645505147687, 'rouge2': 61.793005354430264, 'rougeL': 50.4897941651719, 'rougeLsum': 74.49500409220269}
Bleu Scores: [0.699414840916846, 0.6885479681674689, 0.6673588982582369, 0.6432031117261758]
Gen Len: 463.0182

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 4
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
num_epochs: 7
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge Scores	Bleu Scores	Gen Len
1.9079	1.0	220	1.8554	{'rouge1': 92.82930837423424, 'rouge2': 78.69002098258808, 'rougeL': 67.77525397424012, 'rougeLsum': 92.83122136025193}	[0.8742055507510261, 0.85321871129453, 0.8273357083346458, 0.7999205981632427]	463.0182
1.6535	2.0	440	1.8644	{'rouge1': 93.48067160777316, 'rouge2': 78.9876103970411, 'rougeL': 67.83658288925474, 'rougeLsum': 93.48402466797468}	[0.8755971013914376, 0.8572483593980601, 0.8317815914576417, 0.8041763182250138]	463.0182
1.282	3.0	660	2.0002	{'rouge1': 87.29036568539799, 'rouge2': 73.18485374150632, 'rougeL': 62.10087123916552, 'rougeLsum': 87.26501626335327}	[0.8760501251112071, 0.8593531308703309, 0.8335667371919002, 0.8051115198870601]	463.0182
0.8481	4.0	880	2.2502	{'rouge1': 86.93286796220396, 'rouge2': 72.7995944273867, 'rougeL': 61.376242795856115, 'rougeLsum': 86.92669954280056}	[0.8749955177244617, 0.8580816714753104, 0.8319806879994025, 0.8031812686772342]	463.0182
0.5026	5.0	1100	2.5319	{'rouge1': 74.55412702158021, 'rouge2': 61.949690968753835, 'rougeL': 51.12580948921186, 'rougeLsum': 74.48696099641717}	[0.7003890180040002, 0.689128452909103, 0.6682005373110111, 0.6444964084098124]	463.0182
0.297	6.0	1320	2.8374	{'rouge1': 74.57349965516465, 'rouge2': 61.85762409604638, 'rougeL': 50.76329385869279, 'rougeLsum': 74.51078702126195}	[0.6993958025399786, 0.6884661028969841, 0.6674353154479407, 0.643507030069284]	463.0182
0.2129	7.0	1540	3.0033	{'rouge1': 74.5645505147687, 'rouge2': 61.793005354430264, 'rougeL': 50.4897941651719, 'rougeLsum': 74.49500409220269}	[0.699414840916846, 0.6885479681674689, 0.6673588982582369, 0.6432031117261758]	463.0182

Framework versions

PEFT 0.8.2.dev0
Transformers 4.38.0.dev0
Pytorch 2.1.0+cu121
Datasets 2.16.2.dev0
Tokenizers 0.15.1

DrishtiSharma
/

llama-pro-8b-tweet-summarization

llama-pro-8b-tweet-summarization

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for DrishtiSharma/llama-pro-8b-tweet-summarization

Evaluation results