marcelovidigal's picture
Training in progress, epoch 48
712c11b verified
[]
[]
dados_tokenizados:
DatasetDict({
train: Dataset({
features: ['rotulo', 'rotulo_simples', 'text', 'label', 'input_ids', 'attention_mask'],
num_rows: 4000
})
validation: Dataset({
features: ['rotulo', 'rotulo_simples', 'text', 'label', 'input_ids', 'attention_mask'],
num_rows: 1000
})
test: Dataset({
features: ['rotulo', 'rotulo_simples', 'text', 'label', 'input_ids', 'attention_mask'],
num_rows: 1000
})
})
/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
warnings.warn(
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert/distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
{'eval_loss': 0.21834564208984375, 'eval_accuracy': 0.938, 'eval_runtime': 30.3683, 'eval_samples_per_second': 32.929, 'eval_steps_per_second': 2.075, 'epoch': 1.0}
{'loss': 0.2031, 'grad_norm': 1.1480563879013062, 'learning_rate': 1.2e-05, 'epoch': 2.0}
{'eval_loss': 0.19427122175693512, 'eval_accuracy': 0.938, 'eval_runtime': 42.2287, 'eval_samples_per_second': 23.681, 'eval_steps_per_second': 1.492, 'epoch': 2.0}
{'eval_loss': 0.3195326626300812, 'eval_accuracy': 0.921, 'eval_runtime': 26.5577, 'eval_samples_per_second': 37.654, 'eval_steps_per_second': 2.372, 'epoch': 3.0}
{'loss': 0.0672, 'grad_norm': 1.1029362678527832, 'learning_rate': 4.000000000000001e-06, 'epoch': 4.0}
{'eval_loss': 0.36123067140579224, 'eval_accuracy': 0.925, 'eval_runtime': 26.675, 'eval_samples_per_second': 37.488, 'eval_steps_per_second': 2.362, 'epoch': 4.0}
{'eval_loss': 0.3963741362094879, 'eval_accuracy': 0.926, 'eval_runtime': 25.9784, 'eval_samples_per_second': 38.493, 'eval_steps_per_second': 2.425, 'epoch': 5.0}
{'train_runtime': 8026.8642, 'train_samples_per_second': 2.492, 'train_steps_per_second': 0.156, 'train_loss': 0.11480112991333008, 'epoch': 5.0}
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert/distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert/distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
{'eval_loss': 0.17208045721054077, 'eval_accuracy': 0.939, 'eval_runtime': 40.1609, 'eval_samples_per_second': 24.9, 'eval_steps_per_second': 0.797, 'epoch': 1.0}
{'eval_loss': 0.24476991593837738, 'eval_accuracy': 0.926, 'eval_runtime': 38.171, 'eval_samples_per_second': 26.198, 'eval_steps_per_second': 0.838, 'epoch': 2.0}
{'eval_loss': 0.6838799715042114, 'eval_accuracy': 0.656, 'eval_runtime': 214.6826, 'eval_samples_per_second': 4.658, 'eval_steps_per_second': 0.149, 'epoch': 3.0}
{'loss': 0.2956, 'grad_norm': 2.695140838623047, 'learning_rate': 9.200000000000002e-06, 'epoch': 4.0}
{'eval_loss': 0.31772053241729736, 'eval_accuracy': 0.87, 'eval_runtime': 37.1806, 'eval_samples_per_second': 26.896, 'eval_steps_per_second': 0.861, 'epoch': 4.0}
{'eval_loss': 0.2808445990085602, 'eval_accuracy': 0.932, 'eval_runtime': 37.3397, 'eval_samples_per_second': 26.781, 'eval_steps_per_second': 0.857, 'epoch': 5.0}
{'eval_loss': 0.3926897644996643, 'eval_accuracy': 0.905, 'eval_runtime': 37.6368, 'eval_samples_per_second': 26.57, 'eval_steps_per_second': 0.85, 'epoch': 6.0}
{'eval_loss': 0.37185582518577576, 'eval_accuracy': 0.922, 'eval_runtime': 37.484, 'eval_samples_per_second': 26.678, 'eval_steps_per_second': 0.854, 'epoch': 7.0}
{'loss': 0.1013, 'grad_norm': 0.6478258371353149, 'learning_rate': 8.400000000000001e-06, 'epoch': 8.0}
{'eval_loss': 0.4580109715461731, 'eval_accuracy': 0.91, 'eval_runtime': 38.2702, 'eval_samples_per_second': 26.13, 'eval_steps_per_second': 0.836, 'epoch': 8.0}
{'eval_loss': 0.4977562129497528, 'eval_accuracy': 0.913, 'eval_runtime': 37.7002, 'eval_samples_per_second': 26.525, 'eval_steps_per_second': 0.849, 'epoch': 9.0}
{'eval_loss': 0.4662289023399353, 'eval_accuracy': 0.92, 'eval_runtime': 38.4422, 'eval_samples_per_second': 26.013, 'eval_steps_per_second': 0.832, 'epoch': 10.0}
{'eval_loss': 0.5506279468536377, 'eval_accuracy': 0.901, 'eval_runtime': 37.4907, 'eval_samples_per_second': 26.673, 'eval_steps_per_second': 0.854, 'epoch': 11.0}
{'loss': 0.0442, 'grad_norm': 0.6364777684211731, 'learning_rate': 7.600000000000001e-06, 'epoch': 12.0}
{'eval_loss': 0.578902006149292, 'eval_accuracy': 0.903, 'eval_runtime': 38.2969, 'eval_samples_per_second': 26.112, 'eval_steps_per_second': 0.836, 'epoch': 12.0}
{'eval_loss': 0.47741687297821045, 'eval_accuracy': 0.92, 'eval_runtime': 37.7268, 'eval_samples_per_second': 26.506, 'eval_steps_per_second': 0.848, 'epoch': 13.0}
{'eval_loss': 0.5484298467636108, 'eval_accuracy': 0.894, 'eval_runtime': 38.013, 'eval_samples_per_second': 26.307, 'eval_steps_per_second': 0.842, 'epoch': 14.0}
{'eval_loss': 0.538878321647644, 'eval_accuracy': 0.909, 'eval_runtime': 38.0368, 'eval_samples_per_second': 26.29, 'eval_steps_per_second': 0.841, 'epoch': 15.0}
{'loss': 0.0268, 'grad_norm': 20.58578109741211, 'learning_rate': 6.800000000000001e-06, 'epoch': 16.0}
{'eval_loss': 0.49775975942611694, 'eval_accuracy': 0.921, 'eval_runtime': 37.3442, 'eval_samples_per_second': 26.778, 'eval_steps_per_second': 0.857, 'epoch': 16.0}
{'eval_loss': 0.5782524347305298, 'eval_accuracy': 0.909, 'eval_runtime': 38.4136, 'eval_samples_per_second': 26.032, 'eval_steps_per_second': 0.833, 'epoch': 17.0}
{'eval_loss': 0.5907241106033325, 'eval_accuracy': 0.901, 'eval_runtime': 37.9133, 'eval_samples_per_second': 26.376, 'eval_steps_per_second': 0.844, 'epoch': 18.0}
{'eval_loss': 0.517770528793335, 'eval_accuracy': 0.917, 'eval_runtime': 37.8369, 'eval_samples_per_second': 26.429, 'eval_steps_per_second': 0.846, 'epoch': 19.0}
{'loss': 0.0187, 'grad_norm': 0.01947682909667492, 'learning_rate': 6e-06, 'epoch': 20.0}
{'eval_loss': 0.5195603966712952, 'eval_accuracy': 0.92, 'eval_runtime': 37.9909, 'eval_samples_per_second': 26.322, 'eval_steps_per_second': 0.842, 'epoch': 20.0}
{'eval_loss': 0.8739770650863647, 'eval_accuracy': 0.617, 'eval_runtime': 213.6265, 'eval_samples_per_second': 4.681, 'eval_steps_per_second': 0.15, 'epoch': 21.0}
{'eval_loss': 0.633865237236023, 'eval_accuracy': 0.901, 'eval_runtime': 38.1393, 'eval_samples_per_second': 26.22, 'eval_steps_per_second': 0.839, 'epoch': 22.0}
{'eval_loss': 0.5776236653327942, 'eval_accuracy': 0.92, 'eval_runtime': 38.5846, 'eval_samples_per_second': 25.917, 'eval_steps_per_second': 0.829, 'epoch': 23.0}
wandb: Network error (SSLError), entering retry loop.
{'loss': 0.0549, 'grad_norm': 0.032662052661180496, 'learning_rate': 5.2e-06, 'epoch': 24.0}
{'eval_loss': 0.6649676561355591, 'eval_accuracy': 0.907, 'eval_runtime': 38.4215, 'eval_samples_per_second': 26.027, 'eval_steps_per_second': 0.833, 'epoch': 24.0}
{'eval_loss': 0.6898632645606995, 'eval_accuracy': 0.9, 'eval_runtime': 38.2884, 'eval_samples_per_second': 26.118, 'eval_steps_per_second': 0.836, 'epoch': 25.0}
{'eval_loss': 0.7331468462944031, 'eval_accuracy': 0.9, 'eval_runtime': 37.8445, 'eval_samples_per_second': 26.424, 'eval_steps_per_second': 0.846, 'epoch': 26.0}
{'eval_loss': 0.8004008531570435, 'eval_accuracy': 0.891, 'eval_runtime': 37.8178, 'eval_samples_per_second': 26.443, 'eval_steps_per_second': 0.846, 'epoch': 27.0}
{'loss': 0.0101, 'grad_norm': 0.0740148276090622, 'learning_rate': 4.4e-06, 'epoch': 28.0}
{'eval_loss': 0.7997801303863525, 'eval_accuracy': 0.897, 'eval_runtime': 38.1717, 'eval_samples_per_second': 26.197, 'eval_steps_per_second': 0.838, 'epoch': 28.0}
{'eval_loss': 0.7122868895530701, 'eval_accuracy': 0.903, 'eval_runtime': 38.0461, 'eval_samples_per_second': 26.284, 'eval_steps_per_second': 0.841, 'epoch': 29.0}
{'eval_loss': 0.7891318798065186, 'eval_accuracy': 0.9, 'eval_runtime': 37.564, 'eval_samples_per_second': 26.621, 'eval_steps_per_second': 0.852, 'epoch': 30.0}
{'eval_loss': 0.6890597343444824, 'eval_accuracy': 0.903, 'eval_runtime': 51.1931, 'eval_samples_per_second': 19.534, 'eval_steps_per_second': 0.625, 'epoch': 31.0}
{'loss': 0.0089, 'grad_norm': 0.007422878406941891, 'learning_rate': 3.6000000000000003e-06, 'epoch': 32.0}
{'eval_loss': 0.6430081129074097, 'eval_accuracy': 0.912, 'eval_runtime': 38.3463, 'eval_samples_per_second': 26.078, 'eval_steps_per_second': 0.835, 'epoch': 32.0}
{'eval_loss': 0.6644126176834106, 'eval_accuracy': 0.912, 'eval_runtime': 37.0865, 'eval_samples_per_second': 26.964, 'eval_steps_per_second': 0.863, 'epoch': 33.0}
{'eval_loss': 0.6276940703392029, 'eval_accuracy': 0.914, 'eval_runtime': 37.8123, 'eval_samples_per_second': 26.446, 'eval_steps_per_second': 0.846, 'epoch': 34.0}
{'eval_loss': 0.6321740746498108, 'eval_accuracy': 0.917, 'eval_runtime': 50.725, 'eval_samples_per_second': 19.714, 'eval_steps_per_second': 0.631, 'epoch': 35.0}
{'loss': 0.0078, 'grad_norm': 0.00712945219129324, 'learning_rate': 2.8000000000000003e-06, 'epoch': 36.0}
{'eval_loss': 0.7095584869384766, 'eval_accuracy': 0.908, 'eval_runtime': 37.1239, 'eval_samples_per_second': 26.937, 'eval_steps_per_second': 0.862, 'epoch': 36.0}
{'eval_loss': 0.649186909198761, 'eval_accuracy': 0.911, 'eval_runtime': 37.5593, 'eval_samples_per_second': 26.625, 'eval_steps_per_second': 0.852, 'epoch': 37.0}
{'eval_loss': 0.6124615669250488, 'eval_accuracy': 0.915, 'eval_runtime': 41.4101, 'eval_samples_per_second': 24.149, 'eval_steps_per_second': 0.773, 'epoch': 38.0}
{'eval_loss': 0.7363823056221008, 'eval_accuracy': 0.904, 'eval_runtime': 47.3084, 'eval_samples_per_second': 21.138, 'eval_steps_per_second': 0.676, 'epoch': 39.0}
{'loss': 0.0054, 'grad_norm': 0.013953677378594875, 'learning_rate': 2.0000000000000003e-06, 'epoch': 40.0}
{'eval_loss': 0.6578059196472168, 'eval_accuracy': 0.913, 'eval_runtime': 37.731, 'eval_samples_per_second': 26.503, 'eval_steps_per_second': 0.848, 'epoch': 40.0}
{'eval_loss': 0.7589854598045349, 'eval_accuracy': 0.906, 'eval_runtime': 37.3152, 'eval_samples_per_second': 26.799, 'eval_steps_per_second': 0.858, 'epoch': 41.0}
{'eval_loss': 0.7142490744590759, 'eval_accuracy': 0.906, 'eval_runtime': 37.6936, 'eval_samples_per_second': 26.53, 'eval_steps_per_second': 0.849, 'epoch': 42.0}
{'eval_loss': 0.759125292301178, 'eval_accuracy': 0.903, 'eval_runtime': 37.463, 'eval_samples_per_second': 26.693, 'eval_steps_per_second': 0.854, 'epoch': 43.0}
{'loss': 0.0049, 'grad_norm': 0.007690785452723503, 'learning_rate': 1.2000000000000002e-06, 'epoch': 44.0}
{'eval_loss': 0.6526206731796265, 'eval_accuracy': 0.917, 'eval_runtime': 37.8543, 'eval_samples_per_second': 26.417, 'eval_steps_per_second': 0.845, 'epoch': 44.0}
{'eval_loss': 0.6948218941688538, 'eval_accuracy': 0.909, 'eval_runtime': 37.9107, 'eval_samples_per_second': 26.378, 'eval_steps_per_second': 0.844, 'epoch': 45.0}
{'eval_loss': 0.7213398218154907, 'eval_accuracy': 0.907, 'eval_runtime': 38.4455, 'eval_samples_per_second': 26.011, 'eval_steps_per_second': 0.832, 'epoch': 46.0}
{'eval_loss': 0.6751002669334412, 'eval_accuracy': 0.913, 'eval_runtime': 38.6152, 'eval_samples_per_second': 25.897, 'eval_steps_per_second': 0.829, 'epoch': 47.0}
{'loss': 0.0058, 'grad_norm': 0.0415799506008625, 'learning_rate': 4.0000000000000003e-07, 'epoch': 48.0}