Skip to content

dataloader error #8

@niutransWZY

Description

@niutransWZY

When I used moby_main for training, Linux memory grew until it crashed. What is the reason and how to solve it

The error is:
Traceback (most recent call last):
File "moby_main.py", line 236, in
main(config)
File "moby_main.py", line 121, in main
train_one_epoch(config, model, data_loader_train, optimizer, epoch, lr_scheduler)
File "moby_main.py", line 151, in train_one_epoch
scaled_loss.backward()
File "/root/anaconda3/envs/transformer-ssl/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/root/anaconda3/envs/transformer-ssl/lib/python3.7/site-packages/torch/autograd/init.py", line 132, in backward
allow_unreachable=True) # allow_unreachable flag
File "/root/anaconda3/envs/transformer-ssl/lib/python3.7/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 2605) is killed by signal: Killed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions