optimize pipeline between device fwd and host bwd
Hi
@zhaifly
! Could you describe a bit what you mean?
The title of your PR makes me think of a change in modeling, which should take place in Optimum Habana.
Hi @zhaifly ! Could you describe a bit what you mean?
The title of your PR makes me think of a change in modeling, which should take place in Optimum Habana.
hi
@regisss
, I'm now optimizing the habana models ViT, Swin, GPT2, GPT-J and Neox, I want to add more HABANA specific command line args, and which also should be added into gaudi_config in each of model.
This PR is means: add a mark_step
between the model.forward and loss.backward for better performance when doing training(pipelining host BWD and device FWD). There are lots of code changes in my local side for the 5 models. so I will temporary close this PR and reopen after the code ready.
@zhaifly Sounds good!
A few recommendations:
- To add a
mark_step
between the forward and backward methods, the best is to override thetraining_step
method in theGaudiTrainer
class (unless it is specific to the models you mentioned). - To add more Habana-specific args, we should first modify the
GaudiConfig
class. And then we can update thegaudi_config.json
here.
Do not hesitate to ping me and open PRs on Github when you start working on it :)
@regisss appreciate your suggestions and will follow your comments to clean my code.