From 2aebb407807d64d99d46e995221bfb501f6ef441 Mon Sep 17 00:00:00 2001 From: guozixu Date: Thu, 14 Aug 2025 10:49:13 +0800 Subject: [PATCH 1/8] add ppu quick start doc --- docs/en/get_started/ppu/get_started.md | 75 ++++++++++++++++++++ docs/zh_cn/get_started/ascend/get_started.md | 2 +- docs/zh_cn/get_started/ppu/get_started.md | 72 +++++++++++++++++++ 3 files changed, 148 insertions(+), 1 deletion(-) create mode 100644 docs/en/get_started/ppu/get_started.md create mode 100644 docs/zh_cn/get_started/ppu/get_started.md diff --git a/docs/en/get_started/ppu/get_started.md b/docs/en/get_started/ppu/get_started.md new file mode 100644 index 0000000000..415b028c77 --- /dev/null +++ b/docs/en/get_started/ppu/get_started.md @@ -0,0 +1,75 @@ +# Get Started with PPU + +The usage of lmdeploy on a ppu device is almost the same as its usage on CUDA with PytorchEngine in lmdeploy. +Please read the original [Get Started](../get_started.md) guide before reading this tutorial. + +## Installation + +Please refer to [dlinfer installation guide](https://github.com/DeepLink-org/dlinfer#%E5%AE%89%E8%A3%85%E6%96%B9%E6%B3%95). + + +## Offline batch inference + +> \[!TIP\] +> Graph mode is supported on ppu. +> Users can set `eager_mode=False` to enable graph mode, or set `eager_mode=True` to disable graph mode. + +### LLM inference + +Set `device_type="ppu"` in the `PytorchEngineConfig`: + +```python +from lmdeploy import pipeline +from lmdeploy import PytorchEngineConfig +if __name__ == "__main__": +    pipe = pipeline("internlm/internlm2_5-7b-chat", +     backend_config=PytorchEngineConfig(tp=1, device_type="ppu", eager_mode=True)) +    question = ['Hi, pls intro yourself', 'Shanghai is'] +    response = pipe(question) +    print(response) +``` + +### VLM inference + +Set `device_type="ppu"` in the `PytorchEngineConfig`: + +```python +from lmdeploy import pipeline, PytorchEngineConfig +from lmdeploy.vl import load_image +if __name__ == "__main__": + pipe = pipeline('OpenGVLab/InternVL2-2B', +     backend_config=PytorchEngineConfig(tp=1, device_type='ppu', eager_mode=True)) +    image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg') +    response = pipe(('describe this image', image)) +    print(response) +``` + +## Online serving + +> \[!TIP\] +> Graph mode is supported on ppu. +> Graph mode is default enabled in online serving. Users can add `--eager-mode` to disable graph mode. + +### Serve an LLM model + +Add `--device ppu` in the serve command. + +```bash +lmdeploy serve api_server --backend pytorch --device ppu --eager-mode internlm/internlm2_5-7b-chat +``` + +### Serve a VLM model + +Add `--device ppu` in the serve command + +```bash +lmdeploy serve api_server --backend pytorch --device ppu --eager-mode OpenGVLab/InternVL2-2B +``` + +## Inference with Command line Interface + +Add `--device ppu` in the serve command. + +```bash +lmdeploy chat internlm/internlm2_5-7b-chat --backend pytorch --device ppu --eager-mode +``` diff --git a/docs/zh_cn/get_started/ascend/get_started.md b/docs/zh_cn/get_started/ascend/get_started.md index e076e09fe5..6f1d9fc824 100644 --- a/docs/zh_cn/get_started/ascend/get_started.md +++ b/docs/zh_cn/get_started/ascend/get_started.md @@ -1,6 +1,6 @@ # 华为昇腾(Atlas 800T A2 & Atlas 300I Duo) -我们基于 LMDeploy 的 PytorchEngine,增加了华为昇腾设备的支持。所以,在华为昇腾上使用 LDMeploy 的方法与在英伟达 GPU 上使用 PytorchEngine 后端的方法几乎相同。在阅读本教程之前,请先阅读原版的[快速开始](../get_started.md)。 +我们基于 LMDeploy 的 PytorchEngine,增加了华为昇腾设备的支持。所以,在华为昇腾上使用 LMDeploy 的方法与在英伟达 GPU 上使用 PytorchEngine 后端的方法几乎相同。在阅读本教程之前,请先阅读原版的[快速开始](../get_started.md)。 支持的模型列表在[这里](../../supported_models/supported_models.md#PyTorchEngine-华为昇腾平台). diff --git a/docs/zh_cn/get_started/ppu/get_started.md b/docs/zh_cn/get_started/ppu/get_started.md new file mode 100644 index 0000000000..ccd81991bf --- /dev/null +++ b/docs/zh_cn/get_started/ppu/get_started.md @@ -0,0 +1,72 @@ +# 在阿里平头哥上快速开始 + +我们基于 LMDeploy 的 PytorchEngine,增加了平头哥设备的支持。所以,在平头哥上使用 LMDeploy 的方法与在英伟达 GPU 上使用 PytorchEngine 后端的方法几乎相同。在阅读本教程之前,请先阅读原版的[快速开始](../get_started.md)。 + +## 安装 + +安装请参考 [dlinfer 安装方法](https://github.com/DeepLink-org/dlinfer#%E5%AE%89%E8%A3%85%E6%96%B9%E6%B3%95)。 + +## 离线批处理 + +> \[!TIP\] +> 图模式已支持。用户可以设定`eager_mode=False`来开启图模式,或者设定`eager_mode=True`来关闭图模式。 + +### LLM 推理 + +将`device_type="ppu"`加入`PytorchEngineConfig`的参数中。 + +```python +from lmdeploy import pipeline +from lmdeploy import PytorchEngineConfig +if __name__ == "__main__": +    pipe = pipeline("internlm/internlm2_5-7b-chat", +     backend_config=PytorchEngineConfig(tp=1, device_type="ppu", eager_mode=True)) +    question = ["Shanghai is", "Please introduce China", "How are you?"] +    response = pipe(question) +    print(response) +``` + +### VLM 推理 + +将`device_type="ppu"`加入`PytorchEngineConfig`的参数中。 + +```python +from lmdeploy import pipeline, PytorchEngineConfig +from lmdeploy.vl import load_image +if __name__ == "__main__": +    pipe = pipeline('OpenGVLab/InternVL2-2B', + backend_config=PytorchEngineConfig(tp=1, device_type='ppu', eager_mode=True)) +    image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg') +    response = pipe(('describe this image', image)) +    print(response) +``` + +## 在线服务 + +> \[!TIP\] +> 图模式已支持。 +> 在线服务时,图模式默认开启,用户可以添加`--eager-mode`来关闭图模式。 + +### LLM 模型服务 + +将`--device ppu`加入到服务启动命令中。 + +```bash +lmdeploy serve api_server --backend pytorch --device ppu --eager-mode internlm/internlm2_5-7b-chat +``` + +### VLM 模型服务 + +将`--device ppu`加入到服务启动命令中。 + +```bash +lmdeploy serve api_server --backend pytorch --device ppu --eager-mode OpenGVLab/InternVL2-2B +``` + +## 使用命令行与LLM模型对话 + +将`--device ppu`加入到服务启动命令中。 + +```bash +lmdeploy chat internlm/internlm2_5-7b-chat --backend pytorch --device ppu --eager-mode +``` From be362598fa62c9f317018abb1e3101caa7fa8d9e Mon Sep 17 00:00:00 2001 From: guozixu2001 Date: Thu, 14 Aug 2025 11:07:08 +0800 Subject: [PATCH 2/8] fix lint error --- docs/en/get_started/ppu/get_started.md | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/en/get_started/ppu/get_started.md b/docs/en/get_started/ppu/get_started.md index 415b028c77..6d9179105e 100644 --- a/docs/en/get_started/ppu/get_started.md +++ b/docs/en/get_started/ppu/get_started.md @@ -7,7 +7,6 @@ Please read the original [Get Started](../get_started.md) guide before reading t Please refer to [dlinfer installation guide](https://github.com/DeepLink-org/dlinfer#%E5%AE%89%E8%A3%85%E6%96%B9%E6%B3%95). - ## Offline batch inference > \[!TIP\] From 65c53a1c6b449453490233c64a87ba6fc6ee95a2 Mon Sep 17 00:00:00 2001 From: guozixu Date: Thu, 14 Aug 2025 11:10:32 +0800 Subject: [PATCH 3/8] fix lint error --- docs/en/get_started/ppu/get_started.md | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/en/get_started/ppu/get_started.md b/docs/en/get_started/ppu/get_started.md index 415b028c77..6d9179105e 100644 --- a/docs/en/get_started/ppu/get_started.md +++ b/docs/en/get_started/ppu/get_started.md @@ -7,7 +7,6 @@ Please read the original [Get Started](../get_started.md) guide before reading t Please refer to [dlinfer installation guide](https://github.com/DeepLink-org/dlinfer#%E5%AE%89%E8%A3%85%E6%96%B9%E6%B3%95). - ## Offline batch inference > \[!TIP\] From 3190235350f6636b3027eaca4d837938c73bb000 Mon Sep 17 00:00:00 2001 From: guozixu2001 Date: Fri, 15 Aug 2025 14:33:33 +0800 Subject: [PATCH 4/8] add supported_models doc --- docs/en/supported_models/supported_models.md | 25 +++++++++++++++++++ .../supported_models/supported_models.md | 25 +++++++++++++++++++ 2 files changed, 50 insertions(+) diff --git a/docs/en/supported_models/supported_models.md b/docs/en/supported_models/supported_models.md index 1724615573..16c3ebd415 100644 --- a/docs/en/supported_models/supported_models.md +++ b/docs/en/supported_models/supported_models.md @@ -141,3 +141,28 @@ The following tables detail the models supported by LMDeploy's TurboMind engine | InternVL3 | 1B-78B | MLLM | Yes | Yes | Yes | Yes | Yes | | CogVLM2-chat | 19B | MLLM | Yes | No | - | - | - | | GLM4V | 9B | MLLM | Yes | No | - | - | - | + +## PyTorchEngine on PPU + +| Model | Size | Type | FP16/BF16(eager) | FP16/BF16(graph) | +| :------------: | :-------: | :--: | :--------------: | :--------------: | +| Llama2 | 7B - 70B | LLM | Yes | Yes | +| Llama3 | 8B | LLM | Yes | Yes | +| Llama3.1 | 8B | LLM | Yes | Yes | +| InternLM2 | 7B - 20B | LLM | Yes | Yes | +| InternLM2.5 | 7B - 20B | LLM | Yes | Yes | +| InternLM3 | 8B | LLM | Yes | Yes | +| Mixtral | 8x7B | LLM | Yes | Yes | +| QWen1.5-MoE | A2.7B | LLM | Yes | Yes | +| QWen2(.5) | 7B | LLM | Yes | Yes | +| QWen2-VL | 2B, 7B | MLLM | No | No | +| QWen2.5-VL | 3B - 72B | MLLM | No | No | +| QWen2-MoE | A14.57B | LLM | Yes | Yes | +| QWen3 | 0.6B-235B | LLM | Yes | Yes | +| DeepSeek-V2 | 16B | LLM | No | No | +| InternVL(v1.5) | 2B-26B | MLLM | Yes | Yes | +| InternVL2 | 1B-40B | MLLM | Yes | Yes | +| InternVL2.5 | 1B-78B | MLLM | Yes | Yes | +| InternVL3 | 1B-78B | MLLM | Yes | Yes | +| CogVLM2-chat | 19B | MLLM | - | - | +| GLM4V | 9B | MLLM | - | - | diff --git a/docs/zh_cn/supported_models/supported_models.md b/docs/zh_cn/supported_models/supported_models.md index 76e0cfb38f..643ba18b01 100644 --- a/docs/zh_cn/supported_models/supported_models.md +++ b/docs/zh_cn/supported_models/supported_models.md @@ -141,3 +141,28 @@ | InternVL3 | 1B-78B | MLLM | Yes | Yes | Yes | Yes | Yes | | CogVLM2-chat | 19B | MLLM | Yes | No | - | - | - | | GLM4V | 9B | MLLM | Yes | No | - | - | - | + +## PyTorchEngine 阿里平头哥平台 + +| Model | Size | Type | FP16/BF16(eager) | FP16/BF16(graph) | +| :------------: | :-------: | :--: | :--------------: | :--------------: | +| Llama2 | 7B - 70B | LLM | Yes | Yes | +| Llama3 | 8B | LLM | Yes | Yes | +| Llama3.1 | 8B | LLM | Yes | Yes | +| InternLM2 | 7B - 20B | LLM | Yes | Yes | +| InternLM2.5 | 7B - 20B | LLM | Yes | Yes | +| InternLM3 | 8B | LLM | Yes | Yes | +| Mixtral | 8x7B | LLM | Yes | Yes | +| QWen1.5-MoE | A2.7B | LLM | Yes | Yes | +| QWen2(.5) | 7B | LLM | Yes | Yes | +| QWen2-VL | 2B, 7B | MLLM | No | No | +| QWen2.5-VL | 3B - 72B | MLLM | No | No | +| QWen2-MoE | A14.57B | LLM | Yes | Yes | +| QWen3 | 0.6B-235B | LLM | Yes | Yes | +| DeepSeek-V2 | 16B | LLM | No | No | +| InternVL(v1.5) | 2B-26B | MLLM | Yes | Yes | +| InternVL2 | 1B-40B | MLLM | Yes | Yes | +| InternVL2.5 | 1B-78B | MLLM | Yes | Yes | +| InternVL3 | 1B-78B | MLLM | Yes | Yes | +| CogVLM2-chat | 19B | MLLM | - | - | +| GLM4V | 9B | MLLM | - | - | From e019976772c5d7c2e605f994649b2105de859414 Mon Sep 17 00:00:00 2001 From: guozixu Date: Tue, 19 Aug 2025 17:23:02 +0800 Subject: [PATCH 5/8] fix: remove main in example code, remove unsupported model --- docs/en/get_started/ppu/get_started.md | 24 +++++++++---------- docs/en/supported_models/supported_models.md | 5 ---- .../supported_models/supported_models.md | 5 ---- 3 files changed, 12 insertions(+), 22 deletions(-) diff --git a/docs/en/get_started/ppu/get_started.md b/docs/en/get_started/ppu/get_started.md index 6d9179105e..ab636990f8 100644 --- a/docs/en/get_started/ppu/get_started.md +++ b/docs/en/get_started/ppu/get_started.md @@ -20,12 +20,12 @@ Set `device_type="ppu"` in the `PytorchEngineConfig`: ```python from lmdeploy import pipeline from lmdeploy import PytorchEngineConfig -if __name__ == "__main__": -    pipe = pipeline("internlm/internlm2_5-7b-chat", -     backend_config=PytorchEngineConfig(tp=1, device_type="ppu", eager_mode=True)) -    question = ['Hi, pls intro yourself', 'Shanghai is'] -    response = pipe(question) -    print(response) + +pipe = pipeline("internlm/internlm2_5-7b-chat", + backend_config=PytorchEngineConfig(tp=1, device_type="ppu", eager_mode=False)) +question = ['Hi, pls intro yourself', 'Shanghai is'] +response = pipe(question) +print(response) ``` ### VLM inference @@ -35,12 +35,12 @@ Set `device_type="ppu"` in the `PytorchEngineConfig`: ```python from lmdeploy import pipeline, PytorchEngineConfig from lmdeploy.vl import load_image -if __name__ == "__main__": - pipe = pipeline('OpenGVLab/InternVL2-2B', -     backend_config=PytorchEngineConfig(tp=1, device_type='ppu', eager_mode=True)) -    image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg') -    response = pipe(('describe this image', image)) -    print(response) + +pipe = pipeline('OpenGVLab/InternVL2-2B', + backend_config=PytorchEngineConfig(tp=1, device_type='ppu', eager_mode=False)) +image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg') +response = pipe(('describe this image', image)) +print(response) ``` ## Online serving diff --git a/docs/en/supported_models/supported_models.md b/docs/en/supported_models/supported_models.md index 16c3ebd415..2a57c3d3e8 100644 --- a/docs/en/supported_models/supported_models.md +++ b/docs/en/supported_models/supported_models.md @@ -155,14 +155,9 @@ The following tables detail the models supported by LMDeploy's TurboMind engine | Mixtral | 8x7B | LLM | Yes | Yes | | QWen1.5-MoE | A2.7B | LLM | Yes | Yes | | QWen2(.5) | 7B | LLM | Yes | Yes | -| QWen2-VL | 2B, 7B | MLLM | No | No | -| QWen2.5-VL | 3B - 72B | MLLM | No | No | | QWen2-MoE | A14.57B | LLM | Yes | Yes | | QWen3 | 0.6B-235B | LLM | Yes | Yes | -| DeepSeek-V2 | 16B | LLM | No | No | | InternVL(v1.5) | 2B-26B | MLLM | Yes | Yes | | InternVL2 | 1B-40B | MLLM | Yes | Yes | | InternVL2.5 | 1B-78B | MLLM | Yes | Yes | | InternVL3 | 1B-78B | MLLM | Yes | Yes | -| CogVLM2-chat | 19B | MLLM | - | - | -| GLM4V | 9B | MLLM | - | - | diff --git a/docs/zh_cn/supported_models/supported_models.md b/docs/zh_cn/supported_models/supported_models.md index 643ba18b01..903f4bfb23 100644 --- a/docs/zh_cn/supported_models/supported_models.md +++ b/docs/zh_cn/supported_models/supported_models.md @@ -155,14 +155,9 @@ | Mixtral | 8x7B | LLM | Yes | Yes | | QWen1.5-MoE | A2.7B | LLM | Yes | Yes | | QWen2(.5) | 7B | LLM | Yes | Yes | -| QWen2-VL | 2B, 7B | MLLM | No | No | -| QWen2.5-VL | 3B - 72B | MLLM | No | No | | QWen2-MoE | A14.57B | LLM | Yes | Yes | | QWen3 | 0.6B-235B | LLM | Yes | Yes | -| DeepSeek-V2 | 16B | LLM | No | No | | InternVL(v1.5) | 2B-26B | MLLM | Yes | Yes | | InternVL2 | 1B-40B | MLLM | Yes | Yes | | InternVL2.5 | 1B-78B | MLLM | Yes | Yes | | InternVL3 | 1B-78B | MLLM | Yes | Yes | -| CogVLM2-chat | 19B | MLLM | - | - | -| GLM4V | 9B | MLLM | - | - | From ef051d0e50b88b30a480f9ee95b7662873354a01 Mon Sep 17 00:00:00 2001 From: guozixu Date: Tue, 19 Aug 2025 17:30:47 +0800 Subject: [PATCH 6/8] remove main in example code --- docs/zh_cn/get_started/ppu/get_started.md | 24 +++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/docs/zh_cn/get_started/ppu/get_started.md b/docs/zh_cn/get_started/ppu/get_started.md index ccd81991bf..256f89e2a6 100644 --- a/docs/zh_cn/get_started/ppu/get_started.md +++ b/docs/zh_cn/get_started/ppu/get_started.md @@ -18,12 +18,12 @@ ```python from lmdeploy import pipeline from lmdeploy import PytorchEngineConfig -if __name__ == "__main__": -    pipe = pipeline("internlm/internlm2_5-7b-chat", -     backend_config=PytorchEngineConfig(tp=1, device_type="ppu", eager_mode=True)) -    question = ["Shanghai is", "Please introduce China", "How are you?"] -    response = pipe(question) -    print(response) + +pipe = pipeline("internlm/internlm2_5-7b-chat", + backend_config=PytorchEngineConfig(tp=1, device_type="ppu", eager_mode=True)) +question = ["Shanghai is", "Please introduce China", "How are you?"] +response = pipe(question) +print(response) ``` ### VLM 推理 @@ -33,12 +33,12 @@ if __name__ == "__main__": ```python from lmdeploy import pipeline, PytorchEngineConfig from lmdeploy.vl import load_image -if __name__ == "__main__": -    pipe = pipeline('OpenGVLab/InternVL2-2B', - backend_config=PytorchEngineConfig(tp=1, device_type='ppu', eager_mode=True)) -    image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg') -    response = pipe(('describe this image', image)) -    print(response) + +pipe = pipeline('OpenGVLab/InternVL2-2B', + backend_config=PytorchEngineConfig(tp=1, device_type='ppu', eager_mode=True)) +image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg') +response = pipe(('describe this image', image)) +print(response) ``` ## 在线服务 From c4d03f334b0459005429cd70dbaa7349f7ab9944 Mon Sep 17 00:00:00 2001 From: guozixu Date: Tue, 19 Aug 2025 18:55:31 +0800 Subject: [PATCH 7/8] add ppu to index.rst --- docs/en/get_started/index.rst | 1 + docs/zh_cn/get_started/index.rst | 1 + 2 files changed, 2 insertions(+) diff --git a/docs/en/get_started/index.rst b/docs/en/get_started/index.rst index 4343ee9ab1..c6e3f222ed 100644 --- a/docs/en/get_started/index.rst +++ b/docs/en/get_started/index.rst @@ -6,3 +6,4 @@ On Other Platforms :caption: NPU(Huawei) ascend/get_started.md + ppu/get_started.md diff --git a/docs/zh_cn/get_started/index.rst b/docs/zh_cn/get_started/index.rst index 35affc13ce..e8999b90f4 100644 --- a/docs/zh_cn/get_started/index.rst +++ b/docs/zh_cn/get_started/index.rst @@ -6,3 +6,4 @@ :caption: NPU(Huawei) ascend/get_started.md + ppu/get_started.md From 03dda48f24616482253bb56ea382a20b25af5edf Mon Sep 17 00:00:00 2001 From: guozixu Date: Tue, 19 Aug 2025 19:10:20 +0800 Subject: [PATCH 8/8] fix caption --- docs/en/get_started/index.rst | 5 +++++ docs/zh_cn/get_started/index.rst | 5 +++++ 2 files changed, 10 insertions(+) diff --git a/docs/en/get_started/index.rst b/docs/en/get_started/index.rst index c6e3f222ed..a611f93cae 100644 --- a/docs/en/get_started/index.rst +++ b/docs/en/get_started/index.rst @@ -6,4 +6,9 @@ On Other Platforms :caption: NPU(Huawei) ascend/get_started.md + +.. toctree:: + :maxdepth: 1 + :caption: PPU + ppu/get_started.md diff --git a/docs/zh_cn/get_started/index.rst b/docs/zh_cn/get_started/index.rst index e8999b90f4..e1e91f8408 100644 --- a/docs/zh_cn/get_started/index.rst +++ b/docs/zh_cn/get_started/index.rst @@ -6,4 +6,9 @@ :caption: NPU(Huawei) ascend/get_started.md + +.. toctree:: + :maxdepth: 1 + :caption: PPU + ppu/get_started.md