can we have some official training / finetuning recipes for this model ?

#11

by StephennFernandes - opened Sep 17

Sep 17

•

hi on the latest version of transformers i tried to finetune mmBERT on the text classification tasks:
https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification

when i tired to use mmBERT as a drop in replacement over the original uncasedBERT, even after several epochs the accuracy is stuck to 0.3 and f1 score is always 0 .

seems like the mmBERT models are not directly compatible with all BERT finetuning techniques at the moment.

would really approeciate if we could get some training / finetuning guidelines and examples so we could use mmBERT in all possible ways we used mBERT or BERT before.

orionweller

Center for Language and Speech Processing @ JHU org Sep 17

The evaluations were done with (a slightly older version of) this script and others have already fine-tuned it with the example scripts, so it does work with the right environment. Perhaps it is an issue with the attention function, as I had flash attention installed? I know some ModernBERT models had issues with the backup attention function (sdpa attention) in the past though I thought it was resolved. Try something like pip install "flash_attn==2.6.3" --no-build-isolation or similar and see if it changes it.

StephennFernandes

Sep 19

@orionweller thanks for responding back. it really means a lot.

Wanted to know how could i continually pretrain the mmBERT model further on more custom pretraining data. are there any resources for this. how do you recommend is the most stable and performant way to further continually pretrain the mmBERT model ?

orionweller

Center for Language and Speech Processing @ JHU org Sep 22

It really depends on what you want to pre-train on. You might want to checkout RexBERT or BioClinicalBERT for examples. You will need to gather the pre-training data you want and then continue pre-training from one of the checkpoints in mmbert-checkpoints, likely the decay one.

More details can be found in ModernBERT's repo for the training code or Ettin's repo for the data preparation side.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment