qwerrwe / examples /jamba /README.md
winglian's picture
fix some of the edge cases for Jamba (#1452)
05b398a unverified
|
raw
history blame
318 Bytes

Jamba

  • βœ… qlora w/ deepspeed Zero-2 needs at least 2x GPUs and
    • 35GiB VRAM per GPU w minimal context length
    • 56GiB VRAM per GPU (w multipack enabled)
  • βœ… qlora w/ deepspeed Zero-3 needs at least 2x GPUs and 67GiB VRAM (wtf?)
  • βœ… qlora single-gpu, ~51GiB VRAM
  • βœ… multipack
  • ❓ FSDP
  • ❓ 8-bit LoRA