[AUTOMATED] Model Memory Requirements
#21 opened 6 months ago
by
model-sizer-bot
What is the max sequence length that model can compute if I use flash attention?
1
#20 opened 7 months ago
by
halfmoon039
Do I need to apply_chat_template before Supervised Fine-tuning Gemma-1.1-7b-it?
2
#19 opened 7 months ago
by
Syax19
Is 1.1 trained from the same SFT model as 1.0?
1
#18 opened 7 months ago
by
chujiezheng
finetunr error. "triu_tril_cuda_template" not implemented for 'BFloat16'
2
#17 opened 7 months ago
by
Saicy
Update README.md
#16 opened 7 months ago
by
ssalvo41
TemplateError: System role not supported
5
#15 opened 7 months ago
by
luogy
Consider adding <start_of_context> and <stop_of_context> or similar special tokens for context ingestion.
#13 opened 8 months ago
by
qnixsynapse
loss padding_side
1
#12 opened 8 months ago
by
NickyNicky
Why is this completely broken?
2
#11 opened 8 months ago
by
rombodawg
Number of parameters
7
#9 opened 8 months ago
by
HugoLaurencon