chatglm-6b-128k模型输入长文本只能输入32k
此外,我采用3090加载本地模型进行推理时还会报错torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 848.00 MiB. GPU 0 has a total capacity of 23.69 GiB of which 450.38 MiB is free. Process 3932901 has 570.00 MiB memory in use. Process 2034774 has 3.92 GiB memory in use. Including non-PyTorch memory, this process has 18.76 GiB memory in use. Of the allocated memory 16.86 GiB is allocated by PyTorch, and 1.59 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables),有大佬知道怎么解决吗
此外,我采用3090加载本地模型进行推理时还会报错torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 848.00 MiB. GPU 0 has a total capacity of 23.69 GiB of which 450.38 MiB is free. Process 3932901 has 570.00 MiB memory in use. Process 2034774 has 3.92 GiB memory in use. Including non-PyTorch memory, this process has 18.76 GiB memory in use. Of the allocated memory 16.86 GiB is allocated by PyTorch, and 1.59 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables),有大佬知道怎么解决吗
升级设备,换专用显卡。 chatglm-6b-128k, 表示这个版本,可支持128k, 并且有不错的效果,在硬件足够的基础上,软件上最大效果是这样。
但是,前提是你的设备内存足够。
另外,随便问下,你说的32k,大概 提问+回复内容,一共是多少个字。 好奇 3090 能跑到什么程度。
3090的配置到32K就已经显存溢出了吧,模型加载13G 还剩下11G显存没办法推理到128K