How can I finetune this for domain especific applications?
Let's say I want to finetune it so is better on medical lexic/language or something like that. How would I do it? Also, How should the data be formatted? I'm a little bit ignorant so please be kingd with me. Regards.
I havent fine tuned this model, but afaik that you have to fine tune the original model (the link is in this model card) and if you want the long context you have to apply the lsg after the fine tune, with Transformer library fine tune is a few line of code, the format of the files is well documented on the hugging face transformer library, also in github there is a lot of juputer notebooks of the process... this is a bert transformer model, you have to search on how to fine tune a bert embeddings model
also there is some packs for finetunning embeddings, llama indes support fine tune for this type of transformers with a few lines of code, you will need a high grade gpu if you dont want to wait 2 years for the finetune
I was trying to run code to finetune it just for research and run into both issues. Lack of GPU memory and that the model ignores the long context and behaves like normal bert. How do you add the lsg to the model? I mean, the first issue is just better hardware or software optimization, but for the architecture I'm really lost here.
in this model card is the link to the repo for that is pretty simple and straightforward
I think you have to train first the original model (with 512 token max len) and then apply the lsg with the repo in my model card
Thx, that's what I was suspecting. I'll try it. Thx!