Skip to content

Conversation

@windreamer
Copy link
Collaborator

Motivation

NVIDIA has deprecated versioned wheel package since CUDA 13, causing CUDA 13+ installations to fail with deprecated package names like nvidia-cublas-cu13 .

Modification

Remove the unconditional return to allow the version check to execute.

@windreamer windreamer force-pushed the fix_nv_dep branch 5 times, most recently from 4ccb1e2 to f6e326c Compare November 13, 2025 07:01
@windreamer windreamer marked this pull request as ready for review November 13, 2025 07:52
@lvhan028
Copy link
Collaborator

In the runtime_cuda.txt file, the version of torch is restricted to torch<=2.8.0 and >=2.0.0. However, under CUDA 13, pytorch is required a minimum version of 2.9.
We need to upgrade and test it.
cc @zhulinJulia24

@lvhan028
Copy link
Collaborator

Should we also upgrade triton?

@lvhan028
Copy link
Collaborator

In the runtime_cuda.txt file, the version of torch is restricted to torch<=2.8.0 and >=2.0.0. However, under CUDA 13, pytorch is required a minimum version of 2.9. We need to upgrade and test it. cc @zhulinJulia24

Since flash-attention doesn't have a CUDA 13 build yet, we need to be more careful with the lmdeploy CUDA 13 release due to potential compatibility issues.

@windreamer
Copy link
Collaborator Author

In the runtime_cuda.txt file, the version of torch is restricted to torch<=2.8.0 and >=2.0.0. However, under CUDA 13, pytorch is required a minimum version of 2.9. We need to upgrade and test it. cc @zhulinJulia24

Since flash-attention doesn't have a CUDA 13 build yet, we need to be more careful with the lmdeploy CUDA 13 release due to potential compatibility issues.

I think currently we can make our code CUDA 13 ready but do not ship the CUDA 13 wheels and images until testing and relevant dependencies ready. Anyone wants to use LMDeploy in CUDA 13 can build from source by themselves.

@lvhan028
Copy link
Collaborator

lvhan028 commented Nov 14, 2025

I've built the docker image by

docker build . -f docker/Dockerfile -t openmmlab/lmdeploy:test-cu13 --build-arg CUDA_VERSION=cu13

Then in the container, I tried serving a model using turbomind backend but got failure

>>> from lmdeploy import turbomind
/opt/py3/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
  import pynvml  # type: ignore[import]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/py3/lib/python3.10/site-packages/lmdeploy/turbomind/__init__.py", line 24, in <module>
    from .turbomind import TurboMind, update_parallel_config  # noqa: E402
  File "/opt/py3/lib/python3.10/site-packages/lmdeploy/turbomind/turbomind.py", line 35, in <module>
    import _turbomind as _tm  # noqa: E402
ImportError: libcublas.so.13: cannot open shared object file: No such file or directory

There is not "libcublas.so" in /usr/local/cuda

@lvhan028
Copy link
Collaborator

Pytorch engine doesn't work either

@lvhan028
Copy link
Collaborator

lvhan028 commented Nov 14, 2025

In the runtime_cuda.txt file, the version of torch is restricted to torch<=2.8.0 and >=2.0.0. However, under CUDA 13, pytorch is required a minimum version of 2.9. We need to upgrade and test it. cc @zhulinJulia24

Since flash-attention doesn't have a CUDA 13 build yet, we need to be more careful with the lmdeploy CUDA 13 release due to potential compatibility issues.

I think currently we can make our code CUDA 13 ready but do not ship the CUDA 13 wheels and images until testing and relevant dependencies ready. Anyone wants to use LMDeploy in CUDA 13 can build from source by themselves.

But neither inference engine can work even though users can build lmdeploy from source in cu13 env.

@lvhan028
Copy link
Collaborator

I've built the docker image by

docker build . -f docker/Dockerfile -t openmmlab/lmdeploy:test-cu13 --build-arg CUDA_VERSION=cu13

Then in the container, I tried serving a model using turbomind backend but got failure

>>> from lmdeploy import turbomind
/opt/py3/lib/python3.10/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
  import pynvml  # type: ignore[import]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/py3/lib/python3.10/site-packages/lmdeploy/turbomind/__init__.py", line 24, in <module>
    from .turbomind import TurboMind, update_parallel_config  # noqa: E402
  File "/opt/py3/lib/python3.10/site-packages/lmdeploy/turbomind/turbomind.py", line 35, in <module>
    import _turbomind as _tm  # noqa: E402
ImportError: libcublas.so.13: cannot open shared object file: No such file or directory

There is not "libcublas.so" in /usr/local/cuda

After setting export LD_LIBRARY_PATH=/opt/py3/lib/python3.10/site-packages/nvidia/cu13/lib/:$LD_LIBRARY_PATH, turbomind engine works

@lvhan028
Copy link
Collaborator

Pytorch engine doesn't work either

After ugrading triton to its latest version, pytorch engine works too.

I agree we should defer the release until complete verification. In the meantime, I recommend updating the LD_LIBRARY_PATH configuration to this PR to ensure at least one engine is functional.

@windreamer windreamer marked this pull request as draft November 14, 2025 10:34
@windreamer windreamer marked this pull request as ready for review November 17, 2025 04:22
@windreamer windreamer requested a review from lvhan028 November 17, 2025 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] 'nvidia-cublas-cu13' is deprecated causes the installation of lmdeploy from source using uv to fail.

2 participants