Replies: 1 comment
-
|
use the HuggingFace ControlNet Training script which has more optimization builtin. I wrote an article about controlnet training based on this script here https://civitai.com/articles/2078 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
@lllyasviel
I saw you said The training on circle dataset is fast. After 4000 steps (batch size 4, learning rate 1e-5, about 50 minutes on A100 PCIE 40G), you converged.
That's around 1.33 steps/sec.
I tried running the same program on RunPod using either A40, A6000, or A100 GPU and the speed is much lower (0.55-0.7 steps/sec).
I also installed xformers (and triton) but got an error like in #218. He suggested to try float16 but #265 (comment) said that SD doesn't work that well with float16.
I tried float16 with xformers and the iteration speed becomes 3x faster (1.5 steps/sec) but training doesn't converge. It's the same issue mentioned in #265.
I have to eventually uninstall xformers, use float32, and tolerate 0.55-0.7 steps/sec speed. My problem is that I'm not able to replicate the training speed you have using the same GPU (A100) and it confuses me.
I wonder if you used xformers (and/or triton) package to help accelerate training. Does environment.yaml fully list the packages used to train the model?
Beta Was this translation helpful? Give feedback.
All reactions