Ivan Anisimov
Upload 3 files
beeca48
08/12/2023 21:45:17 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: False
08/12/2023 21:45:17 - WARNING - __main__ - Namespace(dataset_name='s-nlp/paradetox', dataset_config_name=None, train_file=None, ignore_pad_token_for_loss=True, max_source_length=1024, source_prefix=None, preprocessing_num_workers=None, overwrite_cache=None, max_target_length=128, val_max_target_length=None, pad_to_max_length=False, model_name_or_path='s-nlp/bart-base-detox', config_name=None, tokenizer_name=None, text_column=None, summary_column=None, use_slow_tokenizer=False, per_device_train_batch_size=8, per_device_eval_batch_size=4, learning_rate=3e-05, weight_decay=0.0, num_train_epochs=10, max_train_steps=None, gradient_accumulation_steps=2, lr_scheduler_type=<SchedulerType.LINEAR: 'linear'>, warmup_ratio=0.05, output_dir='./output_s-nlp/paradetox_bart_base_detox/2_8_3_1_10_3e-05_fp16', seed=28, model_type=None, teacher_model='s-nlp/bart-base-detox', student_model='s-nlp/bart-base-detox', pred_distill=True, intermediate_distill=True, weight_bits=2, input_bits=8, clip_val=2.5, length_penalty=150, max_length=62, min_length=11, num_beams=6, do_train=True, do_test=True, test_teacher=False, distill_encoder=3, distill_decoder=1, log_steps=20, local_rank=0, weighted=False, new_distill_map=False, task_weight=1, logits_weight=1, hid_weight=1)
08/12/2023 21:45:34 - INFO - __main__ - ***** Running training *****
08/12/2023 21:45:34 - INFO - __main__ - Num examples = 19546
08/12/2023 21:45:34 - INFO - __main__ - Num Epochs = 10
08/12/2023 21:45:34 - INFO - __main__ - Instantaneous batch size per device = 8
08/12/2023 21:45:34 - INFO - __main__ - Total train batch size (w. parallel, distributed & accumulation) = 16
08/12/2023 21:45:34 - INFO - __main__ - Gradient Accumulation steps = 2
08/12/2023 21:45:34 - INFO - __main__ - Total optimization steps = 24440
08/12/2023 21:45:34 - INFO - __main__ - student encoder layers = 3
08/12/2023 21:45:34 - INFO - __main__ - student decoder layers = 1
08/12/2023 21:45:34 - INFO - __main__ - student encoder layers [0, 1, 2] is mapped with teacher encoder layers [0, 2, 5]
08/12/2023 21:45:34 - INFO - __main__ - student decoder layers [0] is mapped with teacher decoder layers [5]
08/12/2023 21:55:05 - INFO - __main__ - evaluation result: {'accuracy': 0.9501243829727173, 'similarity': 0.5612009167671204, 'fluency': 0.8357802033424377, 'joint': 0.4501223564147949, 'chrF': 14.496180717393134}
08/12/2023 22:04:11 - INFO - __main__ - evaluation result: {'accuracy': 0.9501243829727173, 'similarity': 0.5612009167671204, 'fluency': 0.8357802033424377, 'joint': 0.4501223564147949, 'chrF': 16.592977930625022}
08/12/2023 22:13:29 - INFO - __main__ - evaluation result: {'accuracy': 0.9501243829727173, 'similarity': 0.5612009167671204, 'fluency': 0.8357802033424377, 'joint': 0.4501223564147949, 'chrF': 36.010831870597066}
08/12/2023 22:23:10 - INFO - __main__ - evaluation result: {'accuracy': 0.9501243829727173, 'similarity': 0.5612009167671204, 'fluency': 0.8357802033424377, 'joint': 0.4501223564147949, 'chrF': 54.44255064308531}
08/12/2023 22:31:57 - INFO - __main__ - evaluation result: {'accuracy': 0.9501243829727173, 'similarity': 0.5612009167671204, 'fluency': 0.8357802033424377, 'joint': 0.4501223564147949, 'chrF': 58.68707525601931}
08/12/2023 22:41:13 - INFO - __main__ - evaluation result: {'accuracy': 0.9501243829727173, 'similarity': 0.5612009167671204, 'fluency': 0.8357802033424377, 'joint': 0.4501223564147949, 'chrF': 60.221787275905996}
08/12/2023 22:50:27 - INFO - __main__ - evaluation result: {'accuracy': 0.9501243829727173, 'similarity': 0.5612009167671204, 'fluency': 0.8357802033424377, 'joint': 0.4501223564147949, 'chrF': 61.3183124480374}
08/12/2023 22:59:29 - INFO - __main__ - evaluation result: {'accuracy': 0.9501243829727173, 'similarity': 0.5612009167671204, 'fluency': 0.8357802033424377, 'joint': 0.4501223564147949, 'chrF': 61.857692934508634}
08/12/2023 23:08:52 - INFO - __main__ - evaluation result: {'accuracy': 0.9501243829727173, 'similarity': 0.5612009167671204, 'fluency': 0.8357802033424377, 'joint': 0.4501223564147949, 'chrF': 61.83799266394504}
08/12/2023 23:18:03 - INFO - __main__ - evaluation result: {'accuracy': 0.9501243829727173, 'similarity': 0.5612009167671204, 'fluency': 0.8357802033424377, 'joint': 0.4501223564147949, 'chrF': 62.023869928231626}
08/13/2023 13:04:48 - WARNING - __main__ - You're running a t5 model but didn't provide a source prefix, which is the expected, e.g. with `--source_prefix 'summarize: ' `
08/13/2023 13:04:48 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: False
08/13/2023 13:04:48 - WARNING - __main__ - Namespace(dataset_name='s-nlp/paradetox', dataset_config_name=None, train_file=None, ignore_pad_token_for_loss=True, max_source_length=1024, source_prefix=None, preprocessing_num_workers=None, overwrite_cache=None, max_target_length=128, val_max_target_length=None, pad_to_max_length=False, model_name_or_path='t5-large', config_name=None, tokenizer_name=None, text_column=None, summary_column=None, use_slow_tokenizer=False, per_device_train_batch_size=8, per_device_eval_batch_size=4, learning_rate=3e-05, weight_decay=0.0, num_train_epochs=10, max_train_steps=None, gradient_accumulation_steps=2, lr_scheduler_type=<SchedulerType.LINEAR: 'linear'>, warmup_ratio=0.05, output_dir='./output_s-nlp/paradetox_bart_base_detox/2_8_3_1_10_3e-05_fp16', seed=28, model_type=None, teacher_model='t5-large', student_model='t5-large', pred_distill=True, intermediate_distill=True, weight_bits=2, input_bits=8, clip_val=2.5, length_penalty=150, max_length=62, min_length=11, num_beams=6, do_train=True, do_test=True, test_teacher=False, distill_encoder=3, distill_decoder=1, log_steps=20, local_rank=0, weighted=False, new_distill_map=False, task_weight=1, logits_weight=1, hid_weight=1)
08/13/2023 13:05:14 - INFO - __main__ - ***** Running training *****
08/13/2023 13:05:14 - INFO - __main__ - Num examples = 19546
08/13/2023 13:05:14 - INFO - __main__ - Num Epochs = 10
08/13/2023 13:05:14 - INFO - __main__ - Instantaneous batch size per device = 8
08/13/2023 13:05:14 - INFO - __main__ - Total train batch size (w. parallel, distributed & accumulation) = 16
08/13/2023 13:05:14 - INFO - __main__ - Gradient Accumulation steps = 2
08/13/2023 13:05:14 - INFO - __main__ - Total optimization steps = 24440
08/13/2023 13:05:14 - INFO - __main__ - student encoder layers = 3
08/13/2023 13:05:14 - INFO - __main__ - student decoder layers = 1
08/13/2023 13:05:14 - INFO - __main__ - student encoder layers [0, 1, 2] is mapped with teacher encoder layers [0, 2, 5]
08/13/2023 13:05:14 - INFO - __main__ - student decoder layers [0] is mapped with teacher decoder layers [5]
08/13/2023 14:29:47 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: False
08/13/2023 14:29:47 - WARNING - __main__ - Namespace(dataset_name='s-nlp/paradetox', dataset_config_name=None, train_file=None, ignore_pad_token_for_loss=True, max_source_length=1024, source_prefix=None, preprocessing_num_workers=None, overwrite_cache=None, max_target_length=128, val_max_target_length=None, pad_to_max_length=False, model_name_or_path='facebook/bart-large', config_name=None, tokenizer_name=None, text_column=None, summary_column=None, use_slow_tokenizer=False, per_device_train_batch_size=8, per_device_eval_batch_size=4, learning_rate=3e-05, weight_decay=0.0, num_train_epochs=10, max_train_steps=None, gradient_accumulation_steps=2, lr_scheduler_type=<SchedulerType.LINEAR: 'linear'>, warmup_ratio=0.05, output_dir='./output_s-nlp/paradetox_bart_base_detox/2_8_3_1_10_3e-05_fp16', seed=28, model_type=None, teacher_model='facebook/bart-large', student_model='facebook/bart-large', pred_distill=True, intermediate_distill=True, weight_bits=2, input_bits=8, clip_val=2.5, length_penalty=150, max_length=62, min_length=11, num_beams=6, do_train=True, do_test=True, test_teacher=False, distill_encoder=3, distill_decoder=1, log_steps=20, local_rank=0, weighted=False, new_distill_map=False, task_weight=1, logits_weight=1, hid_weight=1)
08/13/2023 14:30:08 - INFO - __main__ - ***** Running training *****
08/13/2023 14:30:08 - INFO - __main__ - Num examples = 19546
08/13/2023 14:30:08 - INFO - __main__ - Num Epochs = 10
08/13/2023 14:30:08 - INFO - __main__ - Instantaneous batch size per device = 8
08/13/2023 14:30:08 - INFO - __main__ - Total train batch size (w. parallel, distributed & accumulation) = 16
08/13/2023 14:30:08 - INFO - __main__ - Gradient Accumulation steps = 2
08/13/2023 14:30:08 - INFO - __main__ - Total optimization steps = 24440
08/13/2023 14:30:08 - INFO - __main__ - student encoder layers = 3
08/13/2023 14:30:08 - INFO - __main__ - student decoder layers = 1
08/13/2023 14:30:08 - INFO - __main__ - student encoder layers [0, 1, 2] is mapped with teacher encoder layers [0, 2, 5]
08/13/2023 14:30:08 - INFO - __main__ - student decoder layers [0] is mapped with teacher decoder layers [5]
08/13/2023 14:42:18 - INFO - __main__ - evaluation result: {'accuracy': 0.9501243829727173, 'similarity': 0.5612009167671204, 'fluency': 0.8357802033424377, 'joint': 0.4501223564147949, 'chrF': 14.422994547204512}
08/13/2023 14:54:26 - INFO - __main__ - evaluation result: {'accuracy': 0.9501243829727173, 'similarity': 0.5612009167671204, 'fluency': 0.8357802033424377, 'joint': 0.4501223564147949, 'chrF': 14.279083299816413}
08/13/2023 15:06:45 - INFO - __main__ - evaluation result: {'accuracy': 0.9501243829727173, 'similarity': 0.5612009167671204, 'fluency': 0.8357802033424377, 'joint': 0.4501223564147949, 'chrF': 15.765409705989143}
08/13/2023 15:19:14 - INFO - __main__ - evaluation result: {'accuracy': 0.9501243829727173, 'similarity': 0.5612009167671204, 'fluency': 0.8357802033424377, 'joint': 0.4501223564147949, 'chrF': 16.45637930896424}
08/13/2023 15:31:18 - INFO - __main__ - evaluation result: {'accuracy': 0.9501243829727173, 'similarity': 0.5612009167671204, 'fluency': 0.8357802033424377, 'joint': 0.4501223564147949, 'chrF': 17.18326706061782}
08/13/2023 15:43:20 - INFO - __main__ - evaluation result: {'accuracy': 0.9501243829727173, 'similarity': 0.5612009167671204, 'fluency': 0.8357802033424377, 'joint': 0.4501223564147949, 'chrF': 17.632538284927946}
08/13/2023 15:55:30 - INFO - __main__ - evaluation result: {'accuracy': 0.9501243829727173, 'similarity': 0.5612009167671204, 'fluency': 0.8357802033424377, 'joint': 0.4501223564147949, 'chrF': 18.300109901381298}
08/13/2023 16:07:26 - INFO - __main__ - evaluation result: {'accuracy': 0.9501243829727173, 'similarity': 0.5612009167671204, 'fluency': 0.8357802033424377, 'joint': 0.4501223564147949, 'chrF': 18.861494907049746}
08/13/2023 16:19:31 - INFO - __main__ - evaluation result: {'accuracy': 0.9501243829727173, 'similarity': 0.5612009167671204, 'fluency': 0.8357802033424377, 'joint': 0.4501223564147949, 'chrF': 19.044193470683002}
08/13/2023 16:31:31 - INFO - __main__ - evaluation result: {'accuracy': 0.9501243829727173, 'similarity': 0.5612009167671204, 'fluency': 0.8357802033424377, 'joint': 0.4501223564147949, 'chrF': 19.34313727953871}