The 34G model is too large for my cpu(16G) to load. So I wander whether we can set max_shard_size or something to split the model. Then, we can the load the model to gpu piece by piece.
Your need to confirm your account before you can post a new comment.