The following values were not passed to `accelerate launch` and had defaults used instead: `--num_processes` was set to a value of `4` More than one GPU was found, enabling multi-GPU training. If this was unintended please pass in `--num_processes=1`. `--num_machines` was set to a value of `1` `--mixed_precision` was set to a value of `'no'` `--dynamo_backend` was set to a value of `'no'` To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`. Params using prompt template alpaca: base_model: baichuan-inc/Baichuan2-7B-Base data_path: ../../data/belle_dolphine/p12.jsonl output_dir: ../out/lora/p12 batch_size: 32 micro_batch_size: 2 num_epochs: 1 learning_rate: 0.0004 cutoff_len: 4096 val_set_size: 0 lr_scheduler: cosine warmup_steps: 100 lora_r: 16 lora_alpha: 16 lora_dropout: 0.05 lora_target_modules: ['gate_proj', 'down_proj', 'up_proj'] train_on_inputs: False add_eos_token: False group_by_length: False wandb_project: lora-moe wandb_run_name: belle_dolphine-p12 wandb_watch: wandb_log_model: resume_from_checkpoint: False gradient_accumulation_steps: 4 gradient_accumulation_steps: 4 gradient_accumulation_steps: 4 gradient_accumulation_steps: 4 Loading checkpoint shards: 0%| | 0/2 [00:00 It should be 1 2 None pre-trained model's BOS EOS and PAD token id: 1 2 0 => It should be 1 2 None Loading checkpoint shards: 100%|██████████| 2/2 [00:17<00:00, 8.40s/it] Loading checkpoint shards: 100%|██████████| 2/2 [00:17<00:00, 8.91s/it] Loading checkpoint shards: 100%|██████████| 2/2 [00:18<00:00, 8.56s/it] Loading checkpoint shards: 100%|██████████| 2/2 [00:18<00:00, 9.11s/it] pre-trained model's BOS EOS and PAD token id: 1 2 0 => It should be 1 2 None pre-trained model's BOS EOS and PAD token id: 1 2 0 => It should be 1 2 None trainable params: 23,199,744 || all params: 7,529,172,992 || trainable%: 0.30813137146205183 trainable params: 23,199,744 || all params: 7,529,172,992 || trainable%: 0.30813137146205183 Map: 0%| | 0/216925 [00:00