+ deepspeed --num_nodes=1 --num_gpus=8 --master_port 35109 --module safe_rlhf.finetune --train_datasets bt --model_name_or_path cerebras/btlm-3b-8k-base --max_length 8092 --trust_remote_code True --epochs 16 --per_device_train_batch_size 8 --per_device_eval_batch_size 2 --gradient_accumulation_steps 1 --gradient_checkpointing --learning_rate 4.7e-6 --lr_scheduler_type cosine --num_warmup_steps 20 --weight_decay 0.0 --seed 42 --output_dir /home/paperspace/safe-rlhf/output/sft --log_type wandb --log_project BT-Training --zero_stage 2 --bf16 True --tf32 True Using pad_token, but it is not set yet. Using pad_token, but it is not set yet. Using pad_token, but it is not set yet. Using pad_token, but it is not set yet. Using pad_token, but it is not set yet. Using pad_token, but it is not set yet. Using pad_token, but it is not set yet. Using pad_token, but it is not set yet. WARNING:datasets.builder:Using custom data configuration robertmyers--sakura-541a529765142ab6 WARNING:datasets.builder:Using custom data configuration robertmyers--sakura-541a529765142ab6 WARNING:datasets.builder:Using custom data configuration robertmyers--sakura-541a529765142ab6 WARNING:datasets.builder:Reusing dataset parquet (/home/paperspace/.cache/huggingface/datasets/robertmyers___parquet/robertmyers--sakura-541a529765142ab6/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec) WARNING:datasets.builder:Reusing dataset parquet (/home/paperspace/.cache/huggingface/datasets/robertmyers___parquet/robertmyers--sakura-541a529765142ab6/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec) WARNING:datasets.builder:Using custom data configuration robertmyers--sakura-541a529765142ab6 WARNING:datasets.builder:Reusing dataset parquet (/home/paperspace/.cache/huggingface/datasets/robertmyers___parquet/robertmyers--sakura-541a529765142ab6/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec) WARNING:datasets.builder:Using custom data configuration robertmyers--sakura-541a529765142ab6 WARNING:datasets.builder:Using custom data configuration robertmyers--sakura-541a529765142ab6 WARNING:datasets.builder:Reusing dataset parquet (/home/paperspace/.cache/huggingface/datasets/robertmyers___parquet/robertmyers--sakura-541a529765142ab6/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec) WARNING:datasets.builder:Reusing dataset parquet (/home/paperspace/.cache/huggingface/datasets/robertmyers___parquet/robertmyers--sakura-541a529765142ab6/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec) WARNING:datasets.builder:Reusing dataset parquet (/home/paperspace/.cache/huggingface/datasets/robertmyers___parquet/robertmyers--sakura-541a529765142ab6/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec) WARNING:datasets.builder:Using custom data configuration robertmyers--sakura-541a529765142ab6 WARNING:datasets.builder:Reusing dataset parquet (/home/paperspace/.cache/huggingface/datasets/robertmyers___parquet/robertmyers--sakura-541a529765142ab6/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec) WARNING:datasets.builder:Using custom data configuration robertmyers--sakura-541a529765142ab6 WARNING:datasets.builder:Reusing dataset parquet (/home/paperspace/.cache/huggingface/datasets/robertmyers___parquet/robertmyers--sakura-541a529765142ab6/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec) Using /home/paperspace/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/paperspace/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/paperspace/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/paperspace/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/paperspace/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/paperspace/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/paperspace/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/paperspace/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /home/paperspace/.cache/torch_extensions/py39_cu117/fused_adam/build.ninja... Building extension module fused_adam... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) Loading extension module fused_adam... Loading extension module fused_adam... Loading extension module fused_adam... Loading extension module fused_adam... Loading extension module fused_adam... Loading extension module fused_adam... Loading extension module fused_adam... Loading extension module fused_adam... WARNING:transformers_modules.cerebras.btlm-3b-8k-base.099ed6b507c686ba96229c0ab34201fee7415cae.modeling_btlm:`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... WARNING:transformers_modules.cerebras.btlm-3b-8k-base.099ed6b507c686ba96229c0ab34201fee7415cae.modeling_btlm:`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... WARNING:transformers_modules.cerebras.btlm-3b-8k-base.099ed6b507c686ba96229c0ab34201fee7415cae.modeling_btlm:`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... WARNING:transformers_modules.cerebras.btlm-3b-8k-base.099ed6b507c686ba96229c0ab34201fee7415cae.modeling_btlm:`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... WARNING:transformers_modules.cerebras.btlm-3b-8k-base.099ed6b507c686ba96229c0ab34201fee7415cae.modeling_btlm:`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... WARNING:transformers_modules.cerebras.btlm-3b-8k-base.099ed6b507c686ba96229c0ab34201fee7415cae.modeling_btlm:`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... WARNING:transformers_modules.cerebras.btlm-3b-8k-base.099ed6b507c686ba96229c0ab34201fee7415cae.modeling_btlm:`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... wandb: Tracking run with wandb version 0.13.4 wandb: W&B syncing is set to `offline` in this directory. wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing. Training 1/16 epoch: 0%| | 0/880 [00:00