Skip to content

[2026秋季][Task02] zlf1201#122

Open
zlf1201 wants to merge 5 commits into
DeepLink-org:mainfrom
zlf1201:QY2026_Autumn_zlf1201_Task02_Ascend
Open

[2026秋季][Task02] zlf1201#122
zlf1201 wants to merge 5 commits into
DeepLink-org:mainfrom
zlf1201:QY2026_Autumn_zlf1201_Task02_Ascend

Conversation

@zlf1201

@zlf1201 zlf1201 commented Jun 6, 2026

Copy link
Copy Markdown

Task 02: Equivariant Tensor Product - Ascend NPU

硬件平台

  • Huawei Ascend 910B2C

实现说明

  • 将几何深度学习中的等变张量积算子从 CUDA 迁移到 Ascend NPU
  • 关键修复: o3.wigner_3j() 内部使用 complex128(NPU 不支持),改为 CPU 预计算后迁移到 NPU
  • 使用 FX Graph Codegen + opt_einsum_fx 优化 einsum 收缩路径

性能结果

  • 前向传播: ~18.3ms
  • 最大绝对误差 (vs CPU): 3.05e-04
  • 确定性: 两次运行输出完全一致

文件

  • dlblas/kernels/ks_competition/torch/QY2026_Autumn_zlf1201_Task02_Ascend.py

zlf1201 added 5 commits June 6, 2026 12:04
Task 02: Equivariant Tensor Product - Ascend NPU Implementation
Hardware: Huawei Ascend 910B2C
Forward pass: ~18.3ms
Max abs error vs CPU: 3.05e-04
Task 03: FramesExpressCoordinates - Ascend NPU
Hardware: Huawei Ascend 910B2C
Forward pass: ~0.165ms (1.07x speedup)
Max abs error vs CPU: 1.19e-06
Optimizations: advanced indexing + manual rsqrt normalization
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant