-
Notifications
You must be signed in to change notification settings - Fork 664
[NPU] ERNIE 4.5 support #3399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
[NPU] ERNIE 4.5 support #3399
Conversation
|
Thanks for your contribution! |
a48c978 to
59ad44e
Compare
9c1eb8e to
9d5d11a
Compare
3972293 to
f3c2b68
Compare
|
您好,麻烦请教下,启动ernie4.5-21b-a3b模型报错: nd-toolkit/latest/opp/vendors/aie_ascendc/op_api/lib/libcust_opapi.so: undefined symbol: aclnnSub.dlsym aclnnSubGetWorkspaceSize from libcust_opapi.so failed, error:/usr/local/Ascend/ascend-toolkit/latest/opp/vendors/aie_ascendc/op_api/lib/libcust_opapi.so: undefined symbol: aclnnSubGetWorkspaceSize.dlsym aclnnSub from libcust_opapi.so failed, error:/usr/local/Ascend/ascend-toolkit/latest/opp/vendors/aie_ascendc/op_api/lib/libcust_opapi.so: undefined symbol: aclnnSub. |
paddlecustomdevice都装了嘛 |
|
嗯嗯,按照步骤构建了,但是还是提示缺失sparse_moe,麻烦问下paddlenlp版本是有指定的分支吗 |
|
NPU630版本适配可用,暂时未合入。待跟进 |
| def clip_and_round(x): | ||
| return np.clip(np.around(x), -127, 127).astype("int8") | ||
|
|
||
| def npu_quant_weight(weight_np): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
动态量化加载权重性能差,待优化
环境配置
基础环境配置
镜像启动
建议使用镜像安装,当然你也可以在裸机上安装。
首先根据自己的系统架构拉取镜像:
启动镜像:
安装高版本 CANN
镜像内的 CANN 套件较老,需要重新安装 CANN Toolkit、CANN Kernels 和 NNAL,版本>=8.1.RC1,请注意,三个软件的版本需配套,推荐使用 8.2.RC1 版本。请正确选择 CPU 架构,CANN kernels 是分硬件的,请注意选择。下载好后按下面顺序安装:
配置环境变量
运行前请配置下列环境变量:
另外默认显存分配机制为
naive_best_fit可选择配置 Paddle 显存分配机制为auto_growth以随着真实数据需要再占用内存/显存,但内存/显存可能会产生碎片,详见。目前由于未知原因,不将显存分配机制设为
auto_growth会爆显存,因此也请设置下面的环境变量:export FLAGS_allocator_strategy=auto_growthPython环境配置安装
Paddle可使用如下命令安装(更高版本的
paddlepaddle和paddleformers有冲突,因此这里建议安装 3.1 版本):详见昇腾 NPU 安装说明。
安装三方库
编译 PaddleCustomDevice 之前,需要安装三方库 spdlog 和 json:
安装 PaddleCustomDevice
git clone https://github.com/PaddlePaddle/PaddleCustomDevice.git cd PaddleCustomDevice/backends/npu bash tools/compile.sh完成编译后执行下面的命令安装:
pip install build/dist/paddle_custom_npu-*.whl --force-reinstall手动安装这个 PR:
如后续报错
please make sure you registered your op first and try again,请在手动安装后回去再覆盖安装一下主线版本PaddlePaddle/PaddleCustomDevice中生成的whl。安装 PaddleNLP
从源码克隆:
到
csrc/npu目录下按照README.md安装:python setup.py build bdist_wheel pip install dist/paddlenlp_ops*.whl编译 FastDeploy
运行时可能会报错:
可以修改
/usr/local/lib/python3.10/dist-packages/paddleformers/utils/pdc_sdk.py22 行的from distutils.dir_util import copy_tree为:运行前需把对应的 FastDeploy 目录添加到
PYTHONPATH:如果遇到
libgomp cannot allocate memory in static TLS block错误,可以按如下方法解决:如果遇到循环导入问题,且不运行多模态模型,可以临时卸载
opencv。另外请注意,目前对numpy2.0 支持不佳,因此在最后请强制安装numpy1.26.4 版本:如果遇到:
先查询:
然后在
/etc/hosts加上上面查询到的 hostname: