Finetune 35B Model

#6
by amitbcp - opened

The notebook is for finetuning 8B model and not 35B.
On changing the model name, it throws an error as the device map is not correctly mapped.

Kindly provide the device mapping or if you can please update the finetuning notebook.

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)

This error message is caused by the tensors used being on different GPUs. Specifically, some tensors are located on device cuda:0 (first GPU) and others are located on device cuda:1 (second GPU). PyTorch expects all tensors used in the same process to be on the same device.
To solve this problem, we need to make sure that all tensors and the model are on the same GPU.

MODEL_NAME = "CohereForAI/aya-23-35b"

Determine the GPU to use

gpu_id = 0 # Use first GPU (cuda:0) or 1
device = torch.device(f"cuda:{gpu_id}" if torch.cuda.is_available() else "cpu")
torch.cuda.set_device(device) # Set active GPU

model = AutoModelForCausalLM.from_pretrained(
MODEL_NAME,
quantization_config=quantization_config,
attn_implementation=attn_implementation,
torch_dtype=torch.bfloat16,
device_map=device, #<<<== Set active GPU
)

These changes will resolve the incompatibility bug between different devices by ensuring that all operations occur on the same GPU.
If you have more than one GPU in your system and you are still experiencing problems, you can check the available GPUs and their usage status using the "nvidia-smi" command. You can also control which GPUs are visible to Python by setting the CUDA_VISIBLE_DEVICES environment variable.

Sign up or log in to comment