Error in subprocess: concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

#3
by BwandoWando - opened

Im trying to run modernBERT in an Azure Databricks A100 compute (Standard_NC24ads_A100_v4) and Im getting the error(s) below, take note though the same code runs flawlessly in my local machine. Though what I've noticed is that only the subprocess errors out and the main thread is alive, thus the output

Screenshot from 2024-12-20 20-30-08.png

I've installed the recommended transformers version from git as of 20-Dec-2024 pip install git+https://github.com/huggingface/transformers.git

Here are the versions of my libraries in the compute, in case someone may be interested

  • flash-attn==2.7.2.post1
  • peft==0.14.0
  • tokenizers==0.21.0
  • torch==2.5.0
  • torchaudio==2.5.0
  • torchvision==0.20.0
  • transformers @ file:///xxxxxxxx/20241220_modernBERT_transformers/transformers.zip#sha256=07162208eb951e2019e3c3abd116e8e672deff5b72fa1aa7ccfff94da62ccd4f

Here's the nvidia-smi output from within the compute

  • CUDA Version: 12.2
  • Driver Version: 535.161.07
  • NVIDIA-SMI 535.161.07

Can anyone help point me to the right direction on how to fix the error?

Sign up or log in to comment