OOM when using USP=True for accelerating inference

## Problem
I am running the `Wan2.2-I2V-A14B` model on an `8xH100` server using `use_usp=True`

Despite each H100 having 80GB VRAM, I encounter an OutOfMemoryError almost immediately during the model loading phase.

## Error
```
....
[rank4]: Traceback (most recent call last):
[rank4]:   File "/home/DiffSynth-Studio/./examples/wanvideo/model_training/validate_lora/Wan2.2-I2V-A14B.py", line 12, in <module>
[rank4]:     pipe = WanVideoPipeline.from_pretrained(
[rank4]:   File "/home/DiffSynth-Studio/diffsynth/pipelines/wan_video.py", line 130, in from_pretrained
[rank4]:     model_pool = pipe.download_and_load_models(model_configs, vram_limit)
[rank4]:   File "/home/DiffSynth-Studio/diffsynth/diffusion/base_pipeline.py", line 289, in download_and_load_models
[rank4]:     model_pool.auto_load_model(
[rank4]:   File "/home/DiffSynth-Studio/diffsynth/models/model_loader.py", line 70, in auto_load_model
[rank4]:     model = self.load_model_file(config, path, vram_config, vram_limit=vram_limit)
[rank4]:   File "/home/DiffSynth-Studio/diffsynth/models/model_loader.py", line 40, in load_model_file
[rank4]:     model = load_model(
[rank4]:   File "/home/DiffSynth-Studio/diffsynth/core/loader/model.py", line 48, in load_model
[rank4]:     state_dict = {i: state_dict[i] for i in state_dict}
[rank4]:   File "/home/DiffSynth-Studio/diffsynth/core/loader/model.py", line 48, in <dictcomp>
[rank4]:     state_dict = {i: state_dict[i] for i in state_dict}
[rank4]:   File "/home/DiffSynth-Studio/diffsynth/core/vram/disk_map.py", line 62, in __getitem__
[rank4]:     param = self.files[file_id].get_tensor(name)
[rank4]: torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 270.00 MiB. GPU 0 has a total capacity of 79.19 GiB of which 220.75 MiB is free. Process 589416 has 4.82 GiB memory in use. Process 589420 has 28.22 GiB memory in use. Including non-PyTorch memory, this process has 17.16 GiB memory in use. Process 589413 has 520.00 MiB memory in use. Process 589419 has 28.22 GiB memory in use. Of the allocated memory 16.00 GiB is allocated by PyTorch, and 580.35 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
....
[rank3]: torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 136.00 MiB. GPU 0 has a total capacity of 79.19 GiB of which 84.75 MiB is free. Including non-PyTorch memory, this process has 4.95 GiB memory in use. Process 589420 has 28.22 GiB memory in use. Process 589417 has 17.16 GiB memory in use. Process 589413 has 520.00 MiB memory in use. Process 589419 has 28.22 GiB memory in use. Of the allocated memory 4.14 GiB is allocated by PyTorch, and 220.71 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
....
```

## Code
```python
import os
import torch
from PIL import Image
from diffsynth.utils.data import save_video, VideoData
from diffsynth.pipelines.wan_video import WanVideoPipeline, ModelConfig
from modelscope import dataset_snapshot_download


pipe = WanVideoPipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
        ModelConfig(model_id="Wan-AI/Wan2.2-I2V-A14B", origin_file_pattern="high_noise_model/diffusion_pytorch_model*.safetensors"),
        ModelConfig(model_id="Wan-AI/Wan2.2-I2V-A14B", origin_file_pattern="low_noise_model/diffusion_pytorch_model*.safetensors"),
        ModelConfig(model_id="Wan-AI/Wan2.2-I2V-A14B", origin_file_pattern="models_t5_umt5-xxl-enc-bf16.pth"),
        ModelConfig(model_id="Wan-AI/Wan2.2-I2V-A14B", origin_file_pattern="Wan2.1_VAE.pth"),
    ],
    use_usp=True,
)
```

## Setting
```
Device: 8*H100
Cuda: 12.8

torch==2.9.1
diffsynth==2.0.0
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OOM when using USP=True for accelerating inference #1164

Problem

Error

Code

Setting

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

OOM when using USP=True for accelerating inference #1164

Description

Problem

Error

Code

Setting

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions