debug CUDA_ERROR_UNKNOWN errors Why? Should follow up, but hard to debug until P2s are addressed and seem to have stopped.
Describe the bug
The GPU video decoding fails with CUDA_ERROR_UNKNOWN, needing the user to restart the node for future segments. Sometimes it's paired with CUDA_ERROR_OUT_OF_MEMORY or CUDA_ERROR_ILLEGAL_ADDRESS.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Decrease the blast radius of these errors if possible, and figure out the root cause.
Screenshots
ERROR_UNKNOWN

ERROR_ILLEGAL_ADDRESS

ERROR_OUT_OF_MEMORY

Additional context
Stack-trace for future reference:
LPMS - https://github.com/livepeer/lpms/blob/master/ffmpeg/decoder.c#L250
FFmpeg - entry-point https://github.com/FFmpeg/FFmpeg/blob/870bfe16a12bf09dca3a4ae27ef6f81a2de80c40/libavutil/hwcontext.c#L610
most-probable line causing the error https://github.com/FFmpeg/FFmpeg/blob/870bfe16a12bf09dca3a4ae27ef6f81a2de80c40/libavutil/hwcontext.c#L629
cuda-specific ctx creation routine
https://github.com/FFmpeg/FFmpeg/blob/870bfe16a12bf09dca3a4ae27ef6f81a2de80c40/libavutil/hwcontext_cuda.c#L379
cuCtxCreate call https://github.com/FFmpeg/FFmpeg/blob/870bfe16a12bf09dca3a4ae27ef6f81a2de80c40/libavutil/hwcontext_cuda.c#L363
debug CUDA_ERROR_UNKNOWN errors Why? Should follow up, but hard to debug until P2s are addressed and seem to have stopped.
Describe the bug
The GPU video decoding fails with
CUDA_ERROR_UNKNOWN, needing the user to restart the node for future segments. Sometimes it's paired withCUDA_ERROR_OUT_OF_MEMORYorCUDA_ERROR_ILLEGAL_ADDRESS.To Reproduce
Steps to reproduce the behavior:
Expected behavior
Decrease the blast radius of these errors if possible, and figure out the root cause.
Screenshots

ERROR_UNKNOWN
ERROR_ILLEGAL_ADDRESS

ERROR_OUT_OF_MEMORY

Additional context
Stack-trace for future reference:
LPMS - https://github.com/livepeer/lpms/blob/master/ffmpeg/decoder.c#L250
FFmpeg - entry-point https://github.com/FFmpeg/FFmpeg/blob/870bfe16a12bf09dca3a4ae27ef6f81a2de80c40/libavutil/hwcontext.c#L610
most-probable line causing the error https://github.com/FFmpeg/FFmpeg/blob/870bfe16a12bf09dca3a4ae27ef6f81a2de80c40/libavutil/hwcontext.c#L629
cuda-specific ctx creation routine
https://github.com/FFmpeg/FFmpeg/blob/870bfe16a12bf09dca3a4ae27ef6f81a2de80c40/libavutil/hwcontext_cuda.c#L379
cuCtxCreate call https://github.com/FFmpeg/FFmpeg/blob/870bfe16a12bf09dca3a4ae27ef6f81a2de80c40/libavutil/hwcontext_cuda.c#L363