Skip to content

Conversation

@1374839016
Copy link

I fix the memory access bug, which describe here #55 . I force cupy allocate memory on pytorch device.

fix CUDA_ERROR_ILLEGAL_ADDRESS
@sniklaus
Copy link
Owner

Huge thanks for bringing this up!

Could you provide some more technical details on how this makes a difference? Currently, all the involved tensors will be on the same device as the first input as per:

rbot0 = one.new_zeros([ one.shape[0], one.shape[2] + 8, one.shape[3] + 8, one.shape[1] ])
rbot1 = one.new_zeros([ one.shape[0], one.shape[2] + 8, one.shape[3] + 8, one.shape[1] ])
one = one.contiguous(); assert(one.is_cuda == True)
two = two.contiguous(); assert(two.is_cuda == True)
output = one.new_zeros([ one.shape[0], 81, one.shape[2], one.shape[3] ])

I am hence a little bit confused on what the proposed fix would change. 🤔

@1374839016
Copy link
Author

Sorry, I don't know, but I guess the code allocate shared memory on default device(GPU 0).

cupy_launch('kernel_Correlation_updateOutput', cupy_kernel('kernel_Correlation_updateOutput', {
    'rbot0': rbot0,
    'rbot1': rbot1,
    'top': output
}))(
    grid=tuple([ output.shape[3], output.shape[2], output.shape[0] ]),
    block=tuple([ 32, 1, 1 ]),
    shared_mem=one.shape[1] * 4,
    args=[ cupy.int32(n), rbot0.data_ptr(), rbot1.data_ptr(), output.data_ptr() ]
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants