optimize pipeline between device fwd and host bwd

by zhaifly - opened Mar 27, 2023

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

-3

zhaifly

Mar 27, 2023

No description provided.

regisss

Habana AI org Mar 27, 2023

Hi @zhaifly ! Could you describe a bit what you mean?
The title of your PR makes me think of a change in modeling, which should take place in Optimum Habana.

zhaifly changed pull request status to closed Mar 27, 2023

zhaifly

Mar 27, 2023

Hi @zhaifly ! Could you describe a bit what you mean?
The title of your PR makes me think of a change in modeling, which should take place in Optimum Habana.

hi @regisss , I'm now optimizing the habana models ViT, Swin, GPT2, GPT-J and Neox, I want to add more HABANA specific command line args, and which also should be added into gaudi_config in each of model.
This PR is means: add a mark_step between the model.forward and loss.backward for better performance when doing training(pipelining host BWD and device FWD). There are lots of code changes in my local side for the 5 models. so I will temporary close this PR and reopen after the code ready.

regisss

Habana AI org Mar 27, 2023

@zhaifly Sounds good!

A few recommendations:

To add a mark_step between the forward and backward methods, the best is to override the training_step method in the GaudiTrainer class (unless it is specific to the models you mentioned).
To add more Habana-specific args, we should first modify the GaudiConfig class. And then we can update the gaudi_config.json here.

Do not hesitate to ping me and open PRs on Github when you start working on it :)

zhaifly

Mar 28, 2023

•

edited Mar 28, 2023

@regisss appreciate your suggestions and will follow your comments to clean my code.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment