Stephen Oates's picture
6 7

Stephen Oates PRO

soates
Β·

AI & ML interests

None yet

Recent Activity

Organizations

None yet

soates's activity

upvoted an article 2 months ago
view article
Article

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

β€’ 203
upvoted 2 articles 3 months ago
view article
Article

Llama-3.1-Storm-8B: Improved SLM with Self-Curation + Model Merging

By akjindal53244 β€’
β€’ 73
view article
Article

A failed experiment: Infini-Attention, and why we should keep trying?

β€’ 50
upvoted an article 6 months ago
updated a collection 7 months ago
Reacted to BramVanroy's post with πŸ‘ 8 months ago
view post
Post
2391
Does anyone have experience with finetuning Gemma? Even the 2B variant feels more memory heavy than mistral 7B. I know that its vocabulary is much larger (250k) but I'm a bit surprised that the max batch size that I can get in an A100 80GB is only 2 whereas I could fit 4 with mistral 7B - even though Gemma is much smaller except for the embedding layer. Both runs were using FA, same sequence length, same deepspeed zero 3 settings. Oh and yes I'm using the most recent hot fix of transformers that solves a memory issue with Gemma and others.

Any prior experience that you can share or suggestions to improve throughout?
  • 4 replies
Β·