Edit model card

llama-pro-8b-tweet-summarization

This model is a fine-tuned version of TencentARC/LLaMA-Pro-8B on the dialogstudio dataset. It achieves the following results on the evaluation set:

  • Loss: 3.0033
  • Rouge Scores: {'rouge1': 74.5645505147687, 'rouge2': 61.793005354430264, 'rougeL': 50.4897941651719, 'rougeLsum': 74.49500409220269}
  • Bleu Scores: [0.699414840916846, 0.6885479681674689, 0.6673588982582369, 0.6432031117261758]
  • Gen Len: 463.0182

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • num_epochs: 7
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Rouge Scores Bleu Scores Gen Len
1.9079 1.0 220 1.8554 {'rouge1': 92.82930837423424, 'rouge2': 78.69002098258808, 'rougeL': 67.77525397424012, 'rougeLsum': 92.83122136025193} [0.8742055507510261, 0.85321871129453, 0.8273357083346458, 0.7999205981632427] 463.0182
1.6535 2.0 440 1.8644 {'rouge1': 93.48067160777316, 'rouge2': 78.9876103970411, 'rougeL': 67.83658288925474, 'rougeLsum': 93.48402466797468} [0.8755971013914376, 0.8572483593980601, 0.8317815914576417, 0.8041763182250138] 463.0182
1.282 3.0 660 2.0002 {'rouge1': 87.29036568539799, 'rouge2': 73.18485374150632, 'rougeL': 62.10087123916552, 'rougeLsum': 87.26501626335327} [0.8760501251112071, 0.8593531308703309, 0.8335667371919002, 0.8051115198870601] 463.0182
0.8481 4.0 880 2.2502 {'rouge1': 86.93286796220396, 'rouge2': 72.7995944273867, 'rougeL': 61.376242795856115, 'rougeLsum': 86.92669954280056} [0.8749955177244617, 0.8580816714753104, 0.8319806879994025, 0.8031812686772342] 463.0182
0.5026 5.0 1100 2.5319 {'rouge1': 74.55412702158021, 'rouge2': 61.949690968753835, 'rougeL': 51.12580948921186, 'rougeLsum': 74.48696099641717} [0.7003890180040002, 0.689128452909103, 0.6682005373110111, 0.6444964084098124] 463.0182
0.297 6.0 1320 2.8374 {'rouge1': 74.57349965516465, 'rouge2': 61.85762409604638, 'rougeL': 50.76329385869279, 'rougeLsum': 74.51078702126195} [0.6993958025399786, 0.6884661028969841, 0.6674353154479407, 0.643507030069284] 463.0182
0.2129 7.0 1540 3.0033 {'rouge1': 74.5645505147687, 'rouge2': 61.793005354430264, 'rougeL': 50.4897941651719, 'rougeLsum': 74.49500409220269} [0.699414840916846, 0.6885479681674689, 0.6673588982582369, 0.6432031117261758] 463.0182

Framework versions

  • PEFT 0.8.2.dev0
  • Transformers 4.38.0.dev0
  • Pytorch 2.1.0+cu121
  • Datasets 2.16.2.dev0
  • Tokenizers 0.15.1
Downloads last month
4
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for DrishtiSharma/llama-pro-8b-tweet-summarization

Adapter
(4)
this model