Skip to content

[2026秋季][Task04] zlf1201#124

Open
zlf1201 wants to merge 3 commits into
DeepLink-org:mainfrom
zlf1201:QY2026_Autumn_zlf1201_Task04_Ascend
Open

[2026秋季][Task04] zlf1201#124
zlf1201 wants to merge 3 commits into
DeepLink-org:mainfrom
zlf1201:QY2026_Autumn_zlf1201_Task04_Ascend

Conversation

@zlf1201

@zlf1201 zlf1201 commented Jun 6, 2026

Copy link
Copy Markdown

Task 04: RelativePositionEncoding - Ascend NPU

硬件平台

  • Huawei Ascend 910B2C

优化说明

  1. 预分配 sentinel 张量(避免每次调用分配)
  2. torch.where 替代算术掩码
  3. 融合 clamp+shift 操作

性能结果

  • 前向传播: ~0.51ms(1.37x 加速)
  • 最大绝对误差: 0.0

文件

  • dlblas/kernels/ks_competition/torch/QY2026_Autumn_zlf1201_Task04_Ascend.py

Task 04: RelativePositionEncoding - Ascend NPU
Hardware: Huawei Ascend 910B2C
Forward pass: ~0.51ms (1.37x speedup)
Max abs error: 0.0
Optimizations: pre-allocated sentinels + torch.where
@zhaochaoxing zhaochaoxing self-requested a review June 8, 2026 02:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant